Development of QSARs for parameterizing Physiology Based ToxicoKinetic models

Development of QSARs for parameterizing Physiology Based ToxicoKinetic models

Accepted Manuscript Development of QSARs for parameterizing Physiology Based ToxicoKinetic models Dimosthenis Α. Sarigiannis, Krystalia Papadaki, Peri...

4MB Sizes 1 Downloads 85 Views

Accepted Manuscript Development of QSARs for parameterizing Physiology Based ToxicoKinetic models Dimosthenis Α. Sarigiannis, Krystalia Papadaki, Periklis Kontoroupis, Spyros P. Karakitsios PII:

S0278-6915(17)30257-0

DOI:

10.1016/j.fct.2017.05.029

Reference:

FCT 9064

To appear in:

Food and Chemical Toxicology

Received Date: 6 March 2017 Revised Date:

13 April 2017

Accepted Date: 14 May 2017

Please cite this article as: Sarigiannis, Dimosthenis.Α., Papadaki, K., Kontoroupis, P., Karakitsios, S.P., Development of QSARs for parameterizing Physiology Based ToxicoKinetic models, Food and Chemical Toxicology (2017), doi: 10.1016/j.fct.2017.05.029. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Development

of

QSARs

for

parameterizing

Physiology

Based

RI PT

ToxicoKinetic models

Dimosthenis Α. Sarigiannis1,2,3*, Krystalia Papadaki1, Periklis Kontoroupis1, Spyros

1

SC

P. Karakitsios1,2,3

Aristotle University of Thessaloniki, Department of Chemical Engineering, Environmental Engineering Laboratory, 54124,

2

M AN U

Thessaloniki, Greece

Centre for Research and Technology Hellas, Chemical Process and Energy Resources Institute, Natural and Renewable

Resource Exploitation Laboratory, 57001, Thessaloniki, Greece

Institute for Advanced Study (IUSS), Piazza della Vittoria 15, 27100 Pavia, Italy

*CORRESPONDING AUTHOR Prof. Dimosthenis Sarigiannis

EP

Director,

TE D

3

Environmental Engineering Laboratory

AC C

Department of Chemical Engineering Aristotle University of Thessaloniki

University Campus, Bldg. D, Rm 201 54124 Thessaloniki, Greece tel. +30-2310-994562

e-mail/ [email protected]

1

ACCEPTED MANUSCRIPT

Abstract

RI PT

A Quantitative Structure Activity Relationship (QSAR) model was developed in order to predict physicochemical and biochemical properties of industrial chemicals of various groups. This model was based on the solvation equation, originally proposed by Abraham. In this work Abraham’s solvation model got parameterized using artificial intelligence techniques such as artificial neural networks (ANNs) for the prediction of partitioning into

SC

kidney, heart, adipose, liver, muscle, brain and lung for the estimation of the bodyweight-normalized maximal metabolic velocity (Vmax) and the Michaelis – Menten constant (Km). Model parameterization using ANNs was

M AN U

compared to the use of non-linear regression (NLR) for organic chemicals. The coupling of ANNs with Abraham’s solvation equation resulted in a model with strong predictive power (R2 up to 0.95) for both partitioning and biokinetic parameters. The proposed model outperformed other QSAR models found in the literature, especially with regard to the estimation and prediction of key biokinetic parameters such as Km. The results show that the physicochemical descriptors used in the model successfully describe the complex

TE D

interactions of the micro-processes governing chemical distribution and metabolism in human tissues. Moreover, ANNs provide a flexible mathematical framework to capture the non-linear biochemical and biological

AC C

EP

interactions compared to less flexible regression techniques.

2

ACCEPTED MANUSCRIPT

Highlights: Abraham solvation equation predicts with high accuracy physicochemical and biochemical constants

-

ANN models outperform the regression based techniques

-

“Non-linear” interactions occurring in biological systems are captured

-

The most critical descriptor was found to be the McGowan volume for most of the parameters

M AN U

SC

RI PT

-

Keywords: QSARs, Artificial Neural Networks, Abraham’s solvation equation, tissue/ blood partition

AC C

EP

TE D

coefficients, metabolic constants

3

ACCEPTED MANUSCRIPT

Octanol/ water partition coefficient

Fltissue

Fractional content of lipids in tissue

Fwtissue

Fractional content of water in tissue

Flblood

Fractional content of lipids in blood

Fwblood

Fractional content of water in blood

Maximal velocity of metabolism normalized to the body weight (µmol/h/kg)

M AN U

Vmaxc

SC

Pow

RI PT

List of abbreviations

Michaelis – Menten constant (µmol/L)

SP

Biological property

E

Excess molar refractivity (cm3/mol/10)

S

Dipolarity / polarizability

A

Effective or summation hydrogen bond acidity

B

Effective or summation hydrogen bond basicity

V

McGowan characteristic volume (cm3/mol/100)

e

Measure of the tendency of a chemical to interact with solute π- and n- electrons

s

Measure of the polarizability of a chemical

a

Measure of the hydrogen bond acidity of a chemical

v β ε

EP

AC C

b

TE D

Km

Measure of the hydrogen bond basicity of a chemical Measure of the lipophilicity of a chemical Νonlinear parameter (c, e, s, a, b, v) Error term

4

ACCEPTED MANUSCRIPT

1

Introduction

RI PT

Physiologically Based ToxicoKinetic (PBTK) models provide quantitative descriptions of Absorption, Distribution, Metabolism and Excretion (ADME) of chemicals in biota based on the interrelationships among physiological, biochemical and physicochemical determinants of these processes (Peyret and Krishnan, 2011;

SC

WHO, 2010). The input parameters required for solving the PBTK model equations are either species-specific or chemical-specific and should reflect biological or mechanistic determinants of ADME of the chemical being

M AN U

modeled. The species-specific parameters relate to alveolar ventilation rate, cardiac output, tissue/ blood flow rates and tissue volumes. The chemical-specific inputs include partition coefficients (blood/ air, tissue/ air or tissue/ blood (Pt/b)), as well as metabolic parameters such as maximal velocity of metabolism (Vmax), Michaelis – Menten constant (Km) and intrinsic clearance. These parameters could be obtained on the basis of independent in vitro or in vivo measurements, which in many cases could be time-consuming and expensive (Béliveau et al.,

TE D

2005; Peyret and Krishnan, 2011; Peyret et al., 2010; Schmitt, 2008). The application of PBTK models in the risk assessment arena is limited due to the lack of the generic character of these models. A critical limiting factor of describing ADME processes for a large chemical space is the proper parameterization for “data poor” compounds.

EP

In order to expand the applicability of PBTK models to cover as much as possible the chemical space, input parameters of these models are predicted using advanced Quantitative Structure-Activity Relationships (QSARs).

AC C

In silico approaches, including QSARs, are widely used for the estimation of the above physicochemical and biochemical properties, biological effects as well as understanding the physicochemical features governing a biological response (Puzyn et al., 2010). QSARs are described as regression or classification models, which form a relationship between the biological effects and chemistry of each chemical compound (Puzyn et al., 2010). Several approaches incorporating QSARs have been proposed for the prediction of partition coefficients for PBTK modeling. The algorithm of Poulin and Krishnan (1995a) was used to calculate the tissue/ blood partition coefficients of organic compounds by dividing the estimated solubility of a chemical in tissues by its estimated

5

ACCEPTED MANUSCRIPT

solubility in blood using only information on water, neutral lipid and phospholipid content. The predictions of tissue/ blood partition coefficients were improved with the use of solubility or partitioning data in vegetable oil instead of n-octanol (Poulin and Krishnan, 1995). Furthermore, DeYongh et al. (1997) estimated tissue/ blood

RI PT

partition coefficients of organic compounds, as the equilibrium distribution between the specific tissue and blood, on the basis of Po/w value. Partitioning of non-ionizable chemicals between seven human tissues and blood was modeled by Baláž and Lukáčová (1999) using an explicit term for protein binding. A nonlinear model equation

SC

based on tissue composition in lipids, proteins, and water was developed for the estimation of tissue/ blood partition coefficients of neutral and ionized compounds (Zhang, 2004, 2005; Zhang and Zhang, 2006). Peyret,

M AN U

Poulin and Krishnan (2010) developed a unified algorithm to predict tissue/ blood partition coefficients of drugs and environmental chemicals, considering both the macro (whole tissue and blood) and the micro (cells and fluids) levels.

The approaches for predicting metabolic rates were related to CYP-mediated metabolism and focused on the identification of substrate specificity and the estimation of Vmax, Km, CLint or CLh. Galliani et al. (1984) calculated

TE D

Vmax and Km for the microsomal N-demethylation of para-substituted N,N-dimethylanilines, in terms of lipophilicity and electronic effect of the substituent. Knaak et al. (2004) estimated Vmax and Km for organophosphorus pesticides using data from Wolcott, Vaughn and Neal (Wolcott and Neal, 1972; Wolcott et al.,

EP

1972). The maximal velocity for eight alkyl benzenes was calculated by Lewis et al. (2003) undergoing oxidative metabolism via human CYP2E1. As far as clearance is concerned, Beliveau et al. (2003) successfully estimated

AC C

hepatic clearance for different volatile organic chemicals using the group contribution method of Gao et al. (1992). This approach was also followed by Price and Krishnan (2011) for the determination of metabolic constants, Vmax and Km, for a group of volatile organic compounds. Several studies related the partitioning between blood, air and tissues to properties of chemicals using the solvation equation obtained by Abraham et al. (Abraham, 1993; Abraham and Acree Jr, 2013; Abraham et al., 1994a; Abraham et al., 1994b; Abraham et al., 2006; Abraham et al., 2013; Abraham et al., 2015; Abraham et al.,

6

ACCEPTED MANUSCRIPT

1999; Abraham and Weathersby, 1994). Nevertheless, limited studies addressed the quantitative prediction of the kinetic parameters of environmental chemicals (Dimitriou-Christidis et al., 2008). Several statistical techniques have been applied for QSAR models and reviewed by Ventura et al. (2013),

RI PT

including Multiple Linear Regressions (Hansch, 1969; Silipo and Hansch, 1975), Artificial Neural Networks (ANNs) (Aoyama et al., 1990; Zupan and Gasteiger, 1999), Decision Trees (Breiman et al., 1984), Random Forests (Breiman, 2001), Partial Least Squares (Eriksson et al., 2003; Wold et al., 2001), Principal Components

SC

Analysis (Wold et al., 1987) and Support Vector Machines (Cristianini and Shawe-Taylor, 2000). ANNs overcame some of the frailties of the MLR approach in the design of new drugs, especially because of their ability

M AN U

to deal with non-linear relationships of high complexity when performing input-output transformations. However, to our knowledge, ANNs have not yet been applied in QSAR studies for ADME properties of environmental chemicals. Based on the above, the objective of this study was to develop QSAR models for expanding the applicability domain of generic PBTK models. These models provide a parameterization of tissue/ blood partition coefficients and metabolic rate constants; accomplished by combining the Abraham’s solvation

TE D

equation parameters to ANN models. In this way it was possible to develop a simple and unified QSAR model, which could be used to predict both physicochemical and biochemical properties as well as to expand its

2

Materials and Methods Overall Approach

AC C

2.1

EP

application on various chemical groups.

The methodological approach presented herein develops on the applicability of the Abraham’s solvation equation via the incorporation of a large number of chemical compounds to address the biological properties of several human tissues. This involved the collection of the necessary input data, a statistical analysis and the model implementation to a large number of chemical compounds. The equation was analyzed using two statistical techniques; the Non Linear Regression and the Artificial Neural Networks. Modeling results from the two methods were compared to corresponding literature studies for environmental chemicals. The model with the best statistical performance was utilized to several chemical groups in order to expand the domain of applicability. 7

ACCEPTED MANUSCRIPT

2.2

Solvation Equation

The Abraham’s solvation equation (Linear Free Energy Relationship, LFER) describes the process of the transfer

RI PT

of chemicals from the liquid phase to a large number of solvents or other condensed phases, including biophases. The descriptors, which characterize these physicochemical and biochemical phenomena, are combined into the equation 1,

(1)

SC

log SP = c + e ⋅ E + s ⋅ S + a ⋅ A + b ⋅ B + v ⋅ V

M AN U

Where SP is a biological property for a series of chemicals in a given system. The independent descriptors are the properties of the examined chemicals, E is the excess molar refractivity of the chemical, S is the chemical’s dipolarity/polarizability, A and B are the chemical’s effective or summation hydrogen bond acidity and basicity, respectively, and V is the McGowan characteristic volume of the chemical (Abraham, 1993; Payne and Kenny, 2002). The coefficients c, e, s, a, b and v reflect the properties of chemicals, so e corresponds to the tendency of

TE D

the chemical to interact with solute π- and n- electrons, s corresponds to the chemical’s dipolarity/polarizability, a and b corresponds to the chemical’s hydrogen bond basicity and acidity, respectively, and v is a measure of chemical’s lipophilicity.

EP

This equation was used as the method of analysis for the present study, where the dependent property, SP, in this occasion, corresponded to tissue (kidney, heart, adipose, liver, muscle, brain, lung)/ blood partition coefficient

AC C

(Pt/b), maximal velocity of metabolism, normalized to the human bodyweight, (Vmaxc) or Michaelis – Menten constant (Km).

2.3

Data Collection

The implementation of Abraham’s equation in scope to predict tissue/ blood partition coefficients and kinetic parameters of metabolism required the experimental values of these parameters and the molecular descriptors of the equation. Furthermore, the experimental tissue/ blood partition coefficients were obtained from Baláž and

8

ACCEPTED MANUSCRIPT

Lukáčová (1999), DeJongh et al. (1997) and Pelekis and Krishnan (2004) studies. The experimental data for metabolic constants, Vmaxc and Km, were extracted from Price and Krishnan (2011), while the numerical values of physicochemical descriptors from Abraham and co-workers (Abraham et al., 1994a; Abraham et al., 1994b;

RI PT

Abraham et al., 1999; Sprunger et al., 2008) and from the online database “Open Notebook Science” (ONS, 2016). The collected data of partition coefficients, metabolic constants and physicochemical descriptors of the

AC C

EP

TE D

M AN U

SC

compounds are listed in Table 1.

9

ACCEPTED MANUSCRIPT

Table 1. Experimental Values of Tissue/ Blood Partition Coefficients (Pt/b), Metabolic Constants (Vmaxc and Km) and Physicochemical Descriptors of the Examined Chemicals. Metabolic Constants

Chemical Compound Kidney

Heart

Adipose

Liver

Muscle

Brain

Vmaxc

Km

(µmol/h/kg)

(µmol/L)

38.726

5.483

Lung

1,1,1,2 -

-

-

-

-

-

-

2.04

2.75

75.86

4.90

2.33

2.51

1.41

-

-

-

-

-

-

-

1,1,2 - Trichloroethane

-

-

-

-

-

-

1,1 - Dichloroethane

-

-

-

-

-

-

-

-

-

-

1,2 - Dichloroethane

-

-

-

-

1,3 - Butadiene

-

-

-

-

1-Butanol

0.69

0.79

0.72

0.83

0.69

2.40

22.91

1.15

2.66

1-Propanol

0.72

0.71

2,2-Dimethylbutane

5.37

1.91

2-Methylpentane

4.90

2-Propanol

0.69

1-Chloro-2,2-

1.32

AC C

trifluoroethane

B

V

0.542

0.630

0.100

0.080

0.8800

0.410

0.000

0.090

0.7576

71.450

4.775

0.595

0.760

0.160

0.120

0.8800

-

57.677

5.848

0.499

0.680

0.130

0.130

0.7576

-

-

75.858

2.018

0.322

0.490

0.100

0.100

0.6352

-

-

72.444

14.322

0.677

0.560

0.000

0.190

1.1391

-

-

-

31.769

2.529

0.416

0.640

0.100

0.110

0.6352

-

-

-

61.944

3.750

0.320

0.230

0.000

0.100

0.5862

0.63

0.62

0.74

-

-

0.224

0.420

0.370

0.480

0.7309

1.48

1.20

1.48

-

-

0.010

0.400

0.150

0.000

0.5659

EP

1-Chloro-2,2,2-

A

0.369

TE D

Trimethylbenzene

S

-

M AN U

1,2,4 -

E

-

1,1,2,2 Tetrachloroethane

SC

Tetrachloroethane 1,1,1-Trichloroethane

Physicochemical Descriptors

RI PT

Tissue: Blood Partition Coefficient

22.91

1.26

1.15

1.26

1.15

-

-

-0.340

0.290

0.150

0.000

0.5482

0.31

0.74

0.71

0.60

0.85

-

-

0.236

0.420

0.370

0.480

0.5900

251.19

13.49

3.80

10.72

2.29

-

-

0.000

0.000

0.000

0.000

0.9540

3.39

213.80

10.96

7.08

9.33

1.95

3.281

11.830

0.000

0.000

0.000

0.000

0.9540

0.72

0.25

0.74

0.69

0.60

0.81

-

-

0.212

0.360

0.330

0.560

0.5900

difluoroethene

10

ACCEPTED MANUSCRIPT

5.62

4.07

213.80

8.51

8.51

7.76

2.66

-

-

0.000

0.000

0.000

0.000

1.0949

3-Methylpentane

5.75

4.47

239.88

11.48

8.91

10.23

2.09

-

-

0.000

0.000

0.000

0.000

0.9540

Benzene

1.86

2.19

66.07

3.63

2.51

2.82

1.86

27.040

1.279

0.610

0.520

0.000

0.140

0.7164

Bromodichloromethane

-

-

-

-

-

-

-

65.766

1.841

0.593

0.690

0.100

0.040

0.6693

Chloroethane

-

-

-

-

-

-

-

61.944

1.549

0.227

0.400

0.000

0.100

0.5128

Chloromethane

-

-

-

-

-

-

-

111.686

19.815

0.249

0.430

0.000

0.080

0.3719

Cis-1,2-trichloroethene

-

-

-

-

-

-

-

30.903

5.164

0.436

0.610

0.110

0.050

0.5922

Cyclohexane

5.50

4.47

199.53

8.51

7.76

8.51

2.09

-

-

0.305

0.100

0.000

0.000

0.8454

Cyclopropane

1.41

2.12

28.18

1.55

1.32

1.20

0.87

-

-

0.408

0.230

0.000

0.000

0.4227

Decane

-

-

-

-

-

-

-

7.943

10.544

0.000

0.000

0.000

0.000

1.5176

Dichloromethane

0.98

0.72

14.13

1.20

0.79

1.00

0.98

47.098

4.710

0.387

0.570

0.100

0.050

0.4943

Diethylether

0.91

1.00

5.89

0.91

0.83

1.07

0.85

-

-

0.041

0.250

0.000

0.450

0.7309

Divinyl ether

0.68

1.92

15.85

1.12

0.91

1.45

0.89

-

-

0.259

0.390

0.000

0.130

0.6449

Enflurane

1.91

4.62

74.13

2.82

2.69

1.82

2.00

-

-

-0.230

0.400

0.120

0.130

0.8009

Ethanol

0.71

0.69

0.16

0.71

0.65

0.59

0.87

-

-

0.246

0.420

0.370

0.480

0.4491

Ethylbenzene

-

-

-

-

-

-

-

60.256

9.795

0.613

0.510

0.000

0.150

0.9982

Fluroxene

0.93

2.80

24.55

1.45

1.45

1.45

0.98

-

-

0.183

0.300

0.000

0.270

0.7410

Halothane

-

-

-

-

-

-

-

46.666

2.028

0.102

0.380

0.150

0.050

0.4709

Heptane

4.68

3.24

204.17

Hexachloroethane

-

-

-

Hexane

3.72

3.47

Isoflurane

1.51

3.70

Methanol

0.83

0.72

Methoxyflurane

1.51

Methylcyclohexane Methylcyclopentane

SC

M AN U

TE D

EP

RI PT

3-Methylhexane

6.31

6.31

1.32

12.417

23.550

0.000

0.000

0.000

0.000

1.0949

-

-

-

-

8.395

3.381

0.680

0.680

0.000

0.000

1.1248

AC C

5.75

128.82

6.46

6.31

6.31

1.26

14.757

42.364

0.000

0.000

0.000

0.000

0.9540

48.98

2.95

1.70

2.09

1.15

-

-

-0.240

0.500

0.100

0.100

0.8009

0.14

0.74

0.81

0.68

1.07

-

-

0.278

0.440

0.430

0.470

0.3082

3.21

63.10

2.51

1.95

1.95

1.15

-

-

0.109

0.670

0.070

0.140

0.8700

-

-

-

-

-

-

-

16.444

35.237

0.244

0.060

0.000

0.000

0.8454

5.50

2.19

204.17

9.12

5.75

8.51

2.00

-

-

0.225

0.100

0.000

0.000

0.8454

11

ACCEPTED MANUSCRIPT

-

-

-

-

-

-

-

4.634

6.934

0.000

0.000

0.000

0.000

1.3767

Octane

-

-

-

-

-

-

-

6.194

10.423

0.000

0.000

0.000

0.000

1.2358

Pentane

1.58

0.52

104.71

5.50

1.86

5.75

1.32

-

-

0.000

0.000

0.000

0.000

0.8131

Propanone

0.74

0.79

0.44

0.79

0.78

0.62

0.81

-

-

0.179

0.700

0.040

0.490

0.5470

Sevoflurane

3.31

6.50

72.44

5.01

2.51

2.19

2.19

-

Styrene

4.83

8.38

50.12

2.69

1.00

8.28

2.03

Teflurane

2.50

4.26

33.11

1.66

3.80

1.82

Tetrachloroethene

-

-

-

-

-

Toluene

1.74

2.88

93.33

4.68

3.39

-

-

-

-

-

Trichloroethene

1.86

2.09

70.79

3.55

Trichloromethane

1.38

0.78

34.67

Vinyl Chloride

-

-

-

-0.465

0.232

0.080

0.147

0.8548

97.724

3.837

0.849

0.650

0.000

0.160

0.9552

1.10

-

-

-0.070

0.210

0.200

0.020

0.6360

-

-

5.598

33.884

0.639

0.440

0.000

0.000

0.8370

3.55

2.04

37.325

1.409

0.601

0.520

0.000

0.140

0.8573

SC

-

M AN U

Trans-1,2-

RI PT

Nonane

-

-

30.903

1.030

0.425

0.410

0.090

0.050

0.5922

2.34

2.57

1.74

83.753

1.901

0.524

0.370

0.080

0.030

0.7146

2.14

1.51

2.51

1.15

-

-

0.425

0.490

0.150

0.020

0.6167

-

-

-

-

39.994

1.600

0.258

0.380

0.000

0.050

0.4698

AC C

EP

TE D

dichloroethene

12

ACCEPTED MANUSCRIPT

2.4

Statistical Analysis

The collected data were analysed using two popular statistical methods: the Non Linear Regression (NLR) method and the Artificial Neural Networks (ANN). Both methods were implemented in Matlab®, using the

2.4.1

RI PT

statistics, curve fitting and the neural network toolbox.

Non Linear Regression Analysis

SC

The method of Non Linear Regression was used with scope to estimate the parameters in the Abraham’s equation. In accordance, the physicochemical properties of the Abraham’s equation were found in literature and were

M AN U

utilized as the model inputs. The experimental values of the tissue/ blood partition coefficients and the metabolic parameters were used as the models expected output. The generalized form of the nonlinear model used is presented in equation 2,

(2)

TE D

y = f (X , β ) + ε

where β represents the nonlinear parameters to be computed (c, e, s, a, b, v) and ε the error terms. The algorithms selected for fitting the nonlinear regression to the observations was the Least Squares (LS)

EP

coupled with the Levenberg-Marquardt algorithm (Moré, 1978). In accordance, the estimation of the score

2.4.2

AC C

vectors, as deduced from the latent variables, was accomplished iteratively, as discussed in Dudek et al. (2006).

Artificial Neural Networks

Artificial Neural Networks were used for developing a nonlinear model based on Abraham’s solvation equation. Multi-Layer Perceptron (MLP) model was selected utilizing the scaled conjugate gradient back-propagation algorithm in order to train the network. The multi-layer network consisted of a single input layer, utilizing the values of the molecular descriptors, the experimental values of the physicochemical and the biochemical parameters; one hidden layer and an output layer of a log-sigmoid transfer function. It is noted that the scaled

13

ACCEPTED MANUSCRIPT

conjugate gradient back-propagation algorithm was selected on the basis of the good convergence rate on the limited sample size used. A graphical representation of this Multi-Layer Perceptron (MLP) model is presented in

M AN U

SC

RI PT

Figure 1.

TE D

Figure 1. Graphical representation of the generic ANN model

The optimal model structure of 5 nodes was selected under the assumptions that the model errors are independent

EP

and identically distributed following a normal distribution and that the boundary condition to the derivative of the log-likelihood with respect to the true variance is zero. The ANN model training was initially accomplished by

AC C

dividing input data between the training (70% of the total data), the validation (15% to the total) and the testing (15% to the total) data. The scaled conjugate gradient back-propagation algorithm updated the weights and the bias values in a total of 300 epochs, using the mean squared error as a performance criterion. A number of statistical metrics to evaluate performance were utilized as presented in Tables B1 and B2 in the appendix. It was estimated that the estimated error, i.e. the difference between the observed and the predicted values, was found to follow normal distribution in all the test cases, with the mean close to zero. Furthermore, the presence of systemic error in the NLR and the ANN models was investigated by plotting the residuals to the predicted values against

14

ACCEPTED MANUSCRIPT

the experimental values, seen in Figures D1-D3 for the NLR and the D4-D6 for the ANN respectively. It was found that residuals on both the sides of zero line are randomly distributed which indicated that there is no systemic error, at least for the case of the ANN models. In addition, the residual autocorrelation plots, presented

RI PT

in Figures E1-E3 for the NLR and E4-E6 for the ANN, revealed white noise sequences within the 99% confidence bounds. Lastly, the relative importance of the inputs to the computed output was estimated using the Garson method (Garson, 1991). The technique involves partitioning the hidden–output connection weights of each neuron

SC

into components associated with each input neuron (Goh, 1995). The hidden output connections weights of each hidden neuron were portioned in components and were associated with each input’s neuron as seen in Table C1.

Table A1 – A18 in the appendix.

3 3.1

Results Prediction of Partition Coefficients

M AN U

The utilized input and layer weights for the blood partition coefficients and the kinetic parameters are presented in

TE D

The values of the coefficients c, e, s, a, b and v of Abraham’s solvation equation, as well as the standard error of each of them, were calculated using NLR method and are presented in Table 2 regarding the partition coefficients. The predicted values of tissue (kidney, heart, adipose, liver, muscle, brain, lung)/ blood partition coefficients of

EP

the chemical compounds of the training set are shown in Table 3 for both the ANN and the NLR method.

AC C

Table 2. Calculated Coefficients for Tissue/ Blood Partitioning using NLR Method. Kidney Heart Adipose Liver Muscle Est. SE Est. SE Est. SE Est. SE Est. SE 0.71 0.24 -0.15 0.21 -0.07 0.22 c -0.34 0.20 -0.35 0.29 0.21 0.12 -0.23 0.17 -0.06 0.14 0.17 0.13 0.07 0.13 e 0.18 0.22 -0.45 0.19 -0.55 0.20 s -0.40 0.18 0.51 0.26 0.24 0.35 -0.13 0.49 -1.71 0.41 -0.08 0.36 0.18 0.37 a -3.25 0.26 -0.81 0.23 -0.78 0.24 b -0.62 0.22 -0.73 0.32 1.02 0.22 0.90 0.31 1.72 0.26 1.10 0.22 0.85 0.23 v

Brain Est. SE -0.29 0.17 0.39 0.10 -0.52 0.15 -0.07 0.29 -0.83 0.19 1.19 0.18

Lung Est. SE -0.21 0.14 0.07 0.08 0.01 0.13 0.17 0.24 -0.40 0.15 0.49 0.15

15

ACCEPTED MANUSCRIPT

Table 3. Predicted Values of Tissue/ Blood Partition Coefficients of the Compounds of the Training Set using ANN and NLR method.

Divinyl ether Enflurane Ethanol Fluroxene Heptane Hexane Isoflurane Methanol Methoxyflurane Methylcyclopentane Pentane Propanone

NLR 2.44 1.17 2.17 2.20 0.86

ANN 75.86 0.72 23.33 22.71 0.31

NLR 58.62 0.68 31.22 29.27 0.39

4.87 4.87 0.71 5.33 4.87 1.86

4.28 4.28 0.78 5.95 4.28 1.66

3.17 3.17 0.72 3.74 3.17 2.18

3.19 3.19 0.72 4.27 3.19 2.06

205.59 205.59 0.25 199.88 205.59 66.64

223.46 223.46 0.24 390.74 223.46 34.59

5.35 1.41 0.98 0.91 0.69 1.91

3.50 1.21 1.02 1.08 1.36 1.65

3.74 2.12 0.70 0.99 1.89 3.74

2.44 1.12 1.75 1.24 1.86 3.24

203.02 27.99 14.04 5.90 15.81 72.86

0.71 0.93 5.33 4.87 1.51 0.72

0.62 1.46

3.74 0.78

0.64 1.68 4.27 3.19 3.88 0.48

0.16 24.45 199.88 205.59 49.11

1.52 5.33 1.58 0.71

1.70 3.37 3.08 0.48

3.17 3.74 0.52 0.79

4.30 2.55 2.38 1.23

62.92 198.68 104.80 0.44

5.95 4.28 1.54 0.47

0.68 2.87 3.74 3.17

ANN 4.90 0.83 1.32 1.26 0.74

RI PT

ANN 2.85 0.79 3.74 2.67 0.73

0.14

Brain

Lung

NLR 3.05 1.20 1.90 1.77 0.84

ANN 2.15 0.63 1.48 1.15 0.71

NLR 2.03 1.07 1.66 1.75 0.81

ANN 2.51 0.62 1.21 1.23 0.60

NLR 2.98 1.07 1.50 1.19 0.73

ANN 1.41 0.75 1.48 1.15 0.84

NLR 1.44 1.10 1.26 1.16 0.94

10.23 10.23 0.74 7.00 10.23 3.63

7.87 7.87 0.77 11.24 7.87 2.45

6.31 6.31 0.69 7.54 6.31 2.67

5.58 5.58 0.75 7.36 5.58 1.55

8.85 8.85 0.60 7.00 8.85 2.82

7.12 7.12 0.67 10.49 7.12 2.63

1.85 1.85 0.82 1.87 1.85 1.87

1.81 1.81 0.85 2.12 1.81 1.37

144.82 28.15 19.99 3.49 28.00 34.89

8.51 1.55 1.20 0.91 1.12 2.82

6.07 1.89 1.41 1.51 2.08 2.47

6.59 1.32 0.80 0.90 0.91 2.69

4.17 1.56 1.11 1.17 1.53 1.99

8.61 1.23 1.00 1.07 1.45 1.82

6.16 1.81 1.29 1.25 1.88 1.80

2.09 0.87 0.98 0.82 0.89 2.01

1.69 1.07 1.16 0.94 1.20 1.38

0.22 13.99 390.74 223.46 49.33 0.11 55.18

0.71 1.45

0.59 2.18 11.24 7.87 2.35 0.42

0.65 1.28 7.54 6.31 1.70 0.63

0.62 1.59 7.36 5.58 1.83 0.48

0.50 1.94 10.49 7.12 1.68 0.35

0.87 0.96 1.87 1.85 1.15 1.04

0.80 1.15 2.12 1.81 1.41 0.71

2.51 5.88 5.51 0.58

1.95 6.18 1.87 0.78

1.64 4.11

0.60 1.44 7.00 8.85 2.09 0.60 1.95

2.11 5.74 4.84 0.46

1.15 1.98 1.33 0.85

1.55 1.67 1.54 0.78

SC

NLR 1.94 1.19 1.30 1.17 0.86

M AN U

3-Methylpentane Benzene Cyclohexane Cyclopropane Dichloromethane Diethylether

ANN 2.04 0.71 0.69 1.15 0.71

TE D

1-Chloro-2,2-difluoroethene 1-Propanol 2,2-Dimethylbutane 2-Methylpentane 2-Propanol 3-Methylhexane

Heart

EP

1,1,1-Trichloroethane 1-Butanol 1-Chloro-2,2,2-trifluoroethane

Kidney

AC C

Chemical Compound

Tissue/ blood Partition Coefficients Adipose Liver Muscle

146.51 127.80 1.26

7.00 10.23 2.95 0.71 2.51 9.13 5.50 0.79

4.23 0.45

8.82 5.75 0.62

16

ACCEPTED MANUSCRIPT

3.31 3.36 1.99 2.77

73.57 50.06 33.38 94.20

43.01 78.22 27.27 60.55

5.01 2.69 1.66 4.68

1.86 1.38

2.23 1.59

2.05 0.81

2.12 2.08

70.67 34.62

54.36 32.13

3.55 2.14

3.00 4.15 2.55 3.49

2.50 1.00 3.80 3.18

2.51 2.10 2.36 2.04

2.19 8.28 1.82 3.55

2.02 5.16

3.34 2.22

2.36 1.50

2.32 1.69

2.57 2.51

RI PT

3.74 8.37 3.73 2.89

2.02 3.84

2.18 2.04 1.10 2.03

1.36 1.83 1.34

3.53 2.15

1.74 1.15

1.53 1.40

1.60

TE D

M AN U

SC

1.86 2.81 1.76 2.30

EP

Teflurane Toluene Trichloroethene Trichloromethane

3.31 4.83 2.50 1.74

AC C

Sevoflurane Styrene

17

ACCEPTED MANUSCRIPT

3.2

Prediction of Metabolic Parameters

The values of the coefficients c, e, s, a, b and v of Abraham’s solvation equation and the corresponding standard errors were also calculated regarding the metabolic parameters using NLR method and are

RI PT

presented in Table 4. The predicted values of metabolic parameters, normalized maximal velocity and Michaelis – Menten constant, of the compounds of the training set are shown in Table 5 for both the ANN and NLR method.

M AN U

SC

Table 4. Calculated Coefficients for Metabolic Parameters using NLR Method. Normalized Maximal Velocity Michaelis – Menten Constant Est. SE Est. SE 0.21 0.37 1.65 0.68 c 0.28 0.45 0.15 0.80 e -0.30 0.57 -0.54 1.00 s 1.63 1.22 -1.04 2.16 a 4.45 1.06 -1.21 1.87 b -0.65 0.19 0.39 0.34 v

Table 5. Predicted Values of Normalized Maximal Velocity and Michaelis – Menten Constant of the Compounds of the Training Set.

ANN 38.727 71.446

53.392 75.872 77.757 31.771 61.949 7.683 27.055 65.770 61.949 111.487 30.901

AC C

EP

1,1,1,2-Tetrachloroethane 1,1,2,2-Tetrachloroethane 1,1,2-Trichloroethane 1,1-Dichloroethane 1,2,4-Trimethylbenzene 1,2-Dichloroethane 1,3-Butadiene 2-Methylpentane Benzene Bromodichloromethane Chloroethane Chloromethane cis-1,2-Dichloroethene Decane Dichloromethane Ethylbenzene Halothane Heptane

Normalized Maximal Velocity (µmol/h/kg)

TE D

Chemical Compounds

5.924 47.106 60.182 46.664 7.680

NLR 36.677 65.486 77.305 62.016 60.548 65.805 54.840 10.817 67.143 33.062 51.175 51.123

Michaelis – Menten Constant (µmol/L) ANN 5.481 4.785 5.109

40.742

2.004 14.343 2.531 3.749 22.919 1.030 1.584 1.030 19.771 5.397

NLR 3.719 2.501 2.498 3.107 5.022 2.593 5.200 11.388 4.043 3.246 3.818 3.450 3.006

4.658 45.238 49.257 53.605 8.763

7.911 4.445 9.768 2.035 18.989

18.954 2.911 5.141 2.900 12.935

18

ACCEPTED MANUSCRIPT

trans-1,2-Dichloroethene Trichloroethene Vinyl chloride

3.3

10.817 14.303 5.750

6.206 88.258 7.684 37.305 30.904 83.757

7.099 61.560 14.377 54.074 43.127 30.901

39.998

33.816

Comparison of the Statistical Methods

3.379 22.919 42.364 9.090

7.252 11.388 10.430 16.687

12.223 3.837 33.879 1.412 1.527 1.566 1.615

14.692 4.389 7.418 4.578 4.023 5.290 4.372

RI PT

Methylcyclohexane Nonane Octane Styrene Tetrachloroethene Toluene

8.126

8.394 7.683 16.442 5.924

SC

Hexachloroethane Hexane

M AN U

A comparison of experimental and predicted values of the partition coefficients using the NLR and the ANN methods is presented in Figures 2-8. There is also a comparison between the predicted values using the above techniques and literature values obtained from Zhang’s nonlinear model equation (Zhang, 2004). This nonlinear model equation was used to predict the logarithm of tissue/ blood partition

AC C

EP

TE D

coefficient for the seven main tissues based on tissue composition.

Figure 2. Predicted vs experimental values of kidney/ blood partition coefficient.

Figure 3. Predicted vs experimental values of heart/ blood partition coefficient.

19

Figure 5. Predicted vs experimental values of liver/ blood partition coefficient.

M AN U

SC

Figure 4. Predicted vs experimental values of adipose/ blood partition coefficient.

RI PT

ACCEPTED MANUSCRIPT

Figure 7. Predicted vs experimental values of brain/ blood partition coefficient.

AC C

EP

TE D

Figure 6. Predicted vs experimental values of muscle/ blood partition coefficient.

Figure 8. Predicted vs experimental values of lung/ blood partition coefficient.

The calculated values of metabolic constants using the statistical methods of this study were compared to literature values by Price and Krishnan (2011) in Figures 9 and 10. The methodology followed by Price and Krishnan (2011) was based on the group contribution method, which imply that each fragment in the

20

ACCEPTED MANUSCRIPT

molecular structure has contribution to the metabolic parameters, depending on its frequency of

M AN U

SC

RI PT

occurrence in the given molecule (Gao et al., 1992).

Figure 9. Predicted vs experimental values of normalized maximal velocity.

3.4

Figure 10. Predicted vs experimental values of Michaelis – Menten constant.

Expanding the Domain of Applicability

TE D

The combination of Abraham’s equation and ANN method was used for the estimation of tissue/ blood partition coefficients for the main human tissues for several chemical compounds with unknown values of partition coefficients in order to expand the chemical space of the application of the developed QSAR

EP

model. These compounds were categorized into chemical families, including hydrocarbons, aromatic and halogenated hydrocarbons, alcohols, ketones, ethers and esters, depending on their chemical structure.

AC C

The categorization of chemical compounds into groups is presented in Table 6.

Table 6. Classification of Chemical Compounds. 1,1,1,2 –Tetrachloroethane 1,1,2 – Trichloroethane

Halogenated Hydrocarbons

1,2 – Dichloropropane 2 – Chloropropane Bromochloromethane Chloroethane Halothane

1,1,2,2 – Tetrachloroethane 1,1 - Dichloroethane 1 – Bromo – 2 – chloroethane Chloromethane Bromodichloromethane Hexachloroethane cis – 1,2 – Dichloroethene

Halopropane 1,2 - Dichloroethane 1 – Chloropropane Tetrachloromethane Chlorodibromomethane Tribromomethane trans – 1,2 – 21

ACCEPTED MANUSCRIPT

Tetrachloroethene Fluoroethane 1,1,1,2 – Tetrafluoroethane 1 – Chlorobutane Vinyl Chloride

1 – Pentanol 1 – Methoxy – 2 – propanol 2 – Butoxyethanol Butanone 3 – Pentanone

2 – Methyl – 1 – propanol 3 – Methyl – 1 – butanol

2 – Methoxyethanol

SC

2 – Butanone 2 – Hexanone

2 – Methylcyclohexanone

Ethane

1,3 – Butadiene Octane

Ethene 2 – Methyl – 1,3 – butadiene 2,2,4 – Trimethylpentane Nonane

Desflurane Ethyl tert – butyl ether

Dimethyl ether Ethyl tert – pentyl ether

Propyl acetate Isopropyl acetate Ethyl formate Allylbenzene 1,2,4 – Trimethylbenzene 1,2 – Dimethylbenzene Chlorobenzene

Ethyl acetate Isobutyl acetate

Propene

TE D

AC C

Aromatic Hydrocarbons

2 – Heptanone

2 – Pentanone Cyclohexanone 3 – Methyl – 2 – pentanone

Isophorone

2,3 – Dimethylbutane Methylcyclohexane Decane Ethylene oxide Methyl tert – butyl ether tert – Amyl methyl ether Methyl acetate Butyl acetate Isopentyl acetate Isopropylbenzene 1,2,3 – Trimethylbenzene Propylbenzene 1,3 – Dimethylbenzene Ethylbenzene

EP

Esters

RI PT

β - Chloroprene

Acetylene

Ethers

1 – Chloropentane 1 – Fluoropropane 1,1,2,2,3 – Pentafluoropropane

1 – Chloropentane

4 – Methyl – 2 – pentanone Methane Hydrocarbons

Iodoethane

M AN U

Ketones

Tetrafluoromethane

Vinyl Bromide 2 – Fluoropropane 1,2 – Dichlorotetrafluoroethane 1,1,2,2,3,3,4,4 – Octafluorobutane 2 – Methyl – 2 – propanol

1,2,3 – Trichloropropane

Alcohols

1 - Bromopropane 1,1 – Difluoroethane

Dichloroethene 2 – Bromopropane Fluorotrichloromethane

1,3 – Dichlorobenzene 1,3,5 – Trimethylbenzene 1,4 – Dimethylbenzene 1,2 – Dichlorobenzene

In order to validate the results obtained from Abraham’s solvation equation, a simpler QSAR model was used, described in equation 3,

Pt/b =

Pow ⋅ Fltissue + Fwtissue Pow ⋅ Flblood + Fwblood

(3)

22

ACCEPTED MANUSCRIPT

Where Pow is octanol/ water partition coefficient, Fltissue and Fwtissue are the fractional contents of lipids and water in tissue, respectively, and Flblood and Fwblood are the fractional contents of lipids and water in blood, respectively (Hayes, 2007).

RI PT

The tissue composition was obtained from Baláž and Lukáčová (1999), while the octanol/ water partition coefficient of chemical compounds was calculated by the program EPIWEB 4.1. The tissue/ blood partition coefficients for each chemical group were calculated with the two equations described above.

SC

The square of correlation coefficient, R2, was used as an indicator of model’s goodness of fit and is

TE D

M AN U

presented in the Figures 11-17.

Figure 12. Square of correlation coefficient for heart/ blood partition coefficient.

AC C

EP

Figure 11. Square of correlation coefficient for kidney/ blood partition coefficient.

Figure 13. Square of correlation coefficient for adipose/ blood partition efficient.

Figure 14. Square of correlation coefficient for liver/ blood partition coefficient.

23

SC

Figure 16. Square of correlation coefficient for brain/ blood partition coefficient.

TE D

M AN U

Figure 15. Square of correlation coefficient for muscle/ blood partition coefficient.

RI PT

ACCEPTED MANUSCRIPT

Figure 17. Square of correlation coefficient for lung/ blood partition coefficient.

Discussion

EP

4

The goodness of fit of the models examined was evaluated from the square of the correlation coefficient,

AC C

R2, which, in accordance to Figures 2-10, is high for all the models utilized in this study. Results suggest that the most significant factors influencing the QSAR are successfully captured. In accordance, these factors include the type of compounds, which compose the training set, the physicochemical and biochemical experimental input data, the molecular descriptors of the equation applied and the statistical methods used for data analysis. In addition, the parameters used for training the ANN reflected the physicochemical properties of the tested compounds. It is noted that these properties depend on the atoms that are consisted of, the chemical groups and the bonds between them.

24

ACCEPTED MANUSCRIPT

In previous studies, the parameters used to describe the interactions between chemicals and tissues were mainly related to chemical structure or tissue composition in terms of water, proteins and lipids (Price and Krishnan, 2011; Zhang, 2004). In the present study, Abraham’s equation descriptors are not linked

RI PT

directly with tissue composition. They encode specific chemical information regarding the size, polarizability and hydrogen bonding of the examined chemicals and each term can reveal the factors that influence a particular interaction. As it is concluded from the predicted coefficients of the solvation

SC

equation regarding partition coefficients using non-linear regression (Table 2) and the descriptors of chemicals (Table 1), the non-polar term eE is positive for the majority of the compounds for kidney, liver,

M AN U

muscle, brain and lung. The same term is negative for heart and adipose tissue. This indicates that the general dispersion interactions of the solute lead to positive values of the logarithm of these tissues to blood partition coefficients and to negative values of heart and adipose tissue:blood partition coefficients. The term vV is positive for all the compounds, so molecular size leads to positive values of logPt/b. Regarding the two polar terms of the equation, sS and aA, we conclude the following:

TE D

(a) the first term is an indicator of solute dipolarity/dipolarizability and thus leads to negative values of the partition coefficient logarithm for kidney, liver muscle and brain:blood partitioning and positive values of the heart, adipose and lung:blood partition coefficient logarithm.

EP

(b) The sign of the second term shows that solute hydrogen bond acidity leads to positive values of logarithm of kidney, muscle, lung/ blood partition coefficient and to negative values of the

AC C

remaining tissues partitioning.

The polar term bB, which is an indicator of solute bond basicity, leads the values of the logarithm of the partition coefficients for all tissues to become more negative. The aforementioned terms are also analyzed for the metabolic parameters, which are found to influence the results in a different way depending on the tissues and the properties are referred to. The coefficients are also indicators of the measure of tendency of the given solute to undergo solute-solvent interactions. In general, for tissue/ blood partitioning, when a coefficient is equal or tends to be equal to zero, there is

25

ACCEPTED MANUSCRIPT

an equilibrium between the related with that coefficient property of the two phases. Otherwise, the property of the solute is bigger (positive) or smaller (negative) than that of the solvent. The relative importance of input descriptors to the predicted parameters is estimated via the ANN model.

RI PT

As it is shown in Table C1 in the appendix, the most critical descriptor for the estimated tissue/blood partition coefficients, except for heart/blood partition coefficient, is the McGowan volume. The distribution of a chemical into a tissue depends on the rate of blood flow to the tissue, the tissue mass, and

SC

the partition characteristics between blood and tissue. A compound's distribution between tissues and blood at equilibrium is a function of the respective lipid and water fractions in each matrix (lipid content),

M AN U

which is a function of lipophilicity (DeJongh et al., 1997; Salem and Katz, 2005; Smith and Williams, 2005). Thus, it is logical conclusion that one of the major determinants of partitioning between blood and tissue is the Mc Gowan volume, as it is a measure of lipophilicity. The polarizability (S) was found to be the most important descriptor influencing the heart/ blood partition coefficient. Regarding metabolic parameters, McGowan volume is also the most important parameter for Michaelis –Menten constant,

TE D

while the most important one for maximal velocity of metabolism is by far the polarizability descriptor. Both Michaelis-Menten constant and maximal velocity identify the rate of metabolism of a chemical, which is an enzyme-catalyzed reaction. These reactions are influenced by the ionization state of the

EP

substrate of metabolism or the enzyme binding site of the substrate. The maximal velocity, Vmax, illustrates the turnover number of an enzyme, which is the number of substrate molecules converted into

AC C

product by an enzyme molecule in a unit time when the enzyme is fully saturated with substrate (Berg et al., 2002). So, it is directly related to ionization, as well as polarizability. The modeling results indicate that the molecular descriptors of the equation can be suitable for the estimation of the parameters which characterize the physicochemical and biochemical phenomena. In non-linear regression, the generic form of the free energy relationship is maintained; what is changing is the linear and the exponential coefficients of each parameter. On the contrary, the ANN model can handle the linear free energy parameters without specifying their complex nonlinear relationships. This is particularly germane in situations of complex interactions as the one of industrial chemicals and the 26

ACCEPTED MANUSCRIPT

biological systems, especially when it comes to enzyme/molecule interactions that are necessary for estimating Km and Vmax. As a result, the improved performance of Abraham’s equation can be ascribed to its capacity to represent the complex interactions of the micro-processes of chemicals’ distribution and

RI PT

metabolism into several tissues. With regard to the statistical analysis, two popular statistical methods are assessed; the ANN and the NLR. Modeling results indicated that the former performed exceptional well with regard to the square of

SC

correlation coefficient (R2). This is an indicative of the ANN advantage over the NLR method for handling complex, multivariate problems. Hence the ANN could handle the nonlinear relationships

M AN U

between the independent and dependent variables of the equation used. Alternatively, the NLR method strongly depends on the inherent optimization algorithm to make the appropriate transformations to model parameters in order to capture the inherent nonlinearities. Focusing on the estimated residuals from both models, the ANN model was found to be the favorable one. Graphical residual analysis revealed an underlying normal distribution with mean close to zero and no systematic errors present. In addition, the

TE D

sample autocorrelation values were found to lie within the 99%-confidence bounds for the autocorrelation of the white noise sequences except at zero lag. Therefore we can conclude that the residuals correspond to white noise. Furthermore, according to the Figures 11-17, the correlation of the proposed method to the

EP

QSAR model from literature is satisfactory for most of the chemical groups. In general, the predictive capacity of the model for new chemical entities is influenced by the chemical nature of the molecules of

AC C

the training set used for development of the model. The test set molecules can be predicted well when they have similar chemical structure with those of the training set. It is evident that the model has successfully captured all features common to the training set molecules. In general, QSARs combined with PBTK modeling allows the prediction of the toxicokinetics of chemical compounds for which all data have not yet been experimentally determined. This might imply uncertainty in the predictions. In this regard, sensitivity analysis would be useful to help identify those key parameters for which small deviations could have a significant impact on the output of the model (Hetrick et al., 1991). Furthermore, registration, evaluation, authorization, and restriction of chemicals 27

ACCEPTED MANUSCRIPT

opens the possibility of evaluating chemicals not only on an one-by-one basis, but by grouping chemicals in categories depending on their degree of structural, reactivity, metabolic, and physicochemical similarity to the chemicals with known toxicological data (ECHA, 2008). Forming chemical categories

RI PT

and then using measured data on a few category members to estimate the missing values for the untested members is a common application of QSARs (Raunio, 2011). Furthermore, once additional data are collected, the ‘leave-one-out’ cross-validation technique will be implemented to the selected Neural

SC

Network. At the moment, there is a continuous effort to enhance our dataset with new experimental data and to include compounds with a broader range of physicochemical and biophysical properties, that will

M AN U

allow us to cover an even larger chemical space. At the same time, emerging ANN techniques such as deep learning algorithms (LeCun et al., 2015; Li et al., 2016) are tested, aiming at expanding the predicting capability of the models, beyond the training domain.

Based on the present study, it was identified that the successful prediction of the chemical specific distribution and metabolic constants offers major advantages, as seen below:

TE D

- Filling the data gaps of “data poor” chemicals, allowing the use of internal dosimetry metric for a wide array of chemical classes. By estimating the corresponding Biologically Effective Dose (BED) at the target tissue, risk characterization could be based upon the potential toxicity of the chemical using the

EP

concept of Biological Pathway Altering Dose (BPAD) (Judson et al., 2010; Judson et al., 2011). BPADs are derived from relatively inexpensive, high-throughput screening (HTS) in vitro data. Use of as detailed

AC C

as possible Physiology Based ToxicoKinetic (PBTK) modeling is the key component so as to estimate the in vivo doses required to achieve the BPAD in the target tissue. Information about BPAD for several

chemicals are publicly available ToxCastDB, (USEPA). - Effectively supports the “safe by design” concept for industrial chemicals, by allowing the successful prediction of toxicokinetic behavior based on molecular descriptors (described by the Abraham solvation equation), thus avoiding chemical structures resulting in bioaccumulative and non-rapidly metabolized chemicals.

28

ACCEPTED MANUSCRIPT

5

Conclusion

This work brings forward a model that can simultaneously give the prediction of chemical distribution and metabolism into non-excretory (brain, heart, lung, muscle, and fat) and excretory tissues (kidney and

RI PT

liver). The model shows strong predictive power in a variety of chemical compounds, including the training and the test set of the chemical groups. The training set gives good relativity between tissue/ blood partition coefficients, metabolic parameters and the selected physicochemical properties of the

SC

chemicals. This is the result of Abraham’s solvation equation that successfully describes the interactions of xenobiotics with biological systems. Furthermore, the use of ANN provides the essential flexible

M AN U

mathematical framework for capturing the non-linear chemico-biological interactions. This enables the model to be efficiently applied to a number of chemicals with unknown parameters, as well as to predict the toxicokinetic behavior of newly designed chemical compounds.

Acknowledgements

TE D

6

The authors gratefully acknowledge the financial support of CEFIC under the CEFIC-LRI project B11: Integrated External and Internal Exposure Modelling Platform (INTEGRA) and of the European

EP

Commission under grants: CROME-LIFE, which was funded from the LIFE+ program (LIFE12 ENV/GR/001040) and HEALS, which was funded from the 7th RTD Framework Programme of the

7

AC C

European Union (Grant agreement no: 603946).

References

Abraham, M.H., 1993. Application of solvation equations to chemical and biochemical processes. Pure and Applied Chemistry 65, 2503-2512. Abraham, M.H., Acree Jr, W.E., 2013. Descriptors for the Prediction of Partition Coefficients and Solubilities of Organophosphorus Compounds. Separation Science and Technology (Philadelphia) 48, 884-897. 29

ACCEPTED MANUSCRIPT

Abraham, M.H., Chadha, H.S., Mitchell, R.C., 1994a. Hydrogen bonding. 33. Factors that influence the distribution of solutes between blood and brain. Journal of Pharmaceutical Sciences 83, 1257-1268. Abraham, M.H., Chadha, H.S., Whiting, G.S., Mitchell, R.C., 1994b. Hydrogen bonding. 32. An analysis of water-octanol and water-alkane partitioning and the ∆log P parameter of seiler. Journal of Pharmaceutical Sciences 83, 1085-1100.

RI PT

Abraham, M.H., Dearden, J.C., Bresnen, G.M., 2006. Hydrogen bonding, steric effects and thermodynamics of partitioning. Journal of Physical Organic Chemistry 19, 242-248. Abraham, M.H., Gola, J.M., Ibrahim, A., Acree, W.E., Liu, X., 2013. The prediction of blood-tissue partitions, water-skin partitions and skin permeation for agrochemicals. Pest Management Science.

SC

Abraham, M.H., Gola, J.M.R., Ibrahim, A., Acree Jr, W.E., Liu, X., 2015. A simple method for estimating in vitro air-tissue and in vivo blood-tissue partition coefficients. Chemosphere 120, 188-191.

M AN U

Abraham, M.H., Martins, F., Mitchell, R.C., Salter, C.J., 1999. Hydrogen bonding. 47. Characterization of the ethylene glycol-heptane partition system: Hydrogen bond acidity and basicity of peptides. Journal of Pharmaceutical Sciences 88, 241-247. Abraham, M.H., Weathersby, P.K., 1994. Hydrogen bonding. 30. Solubility of gases and vapors in biological liquids and tissues. Journal of Pharmaceutical Sciences 83, 1450-1456. Aoyama, T., Suzuki, Y., Ichikawa, H., 1990. Neural networks applied to structure-activity relationships. Journal of Medicinal Chemistry 33, 905-908.

TE D

Baláž, S., Lukáčová, V., 1999. A model-based dependence of the human tissue/blood partition coefficients of chemicals on lipophilicity and tissue composition. Quantitative Structure-Activity Relationships 18, 361-368. Béliveau, M., Lipscomb, J., Tardif, R., Krishnan, K., 2005. Quantitative structure-property relationships for interspecies extrapolation of the inhalation pharmacokinetics of organic chemicals. Chemical Research in Toxicology 18, 475-485.

EP

Béliveau, M., Tardif, R., Krishnan, K., 2003. Quantitative structure-property relationships for physiologically based pharmacokinetic modeling of volatile organic chemicals in rats. Toxicology and Applied Pharmacology 189, 221-232.

AC C

Berg, J.M., Tymoczko, J., Stryer, L., 2002. Biochemistry, 5th edition ed. W H Freeman, New York. Breiman, L., 2001. Random Forests. Machine Learning 45, 5-32. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and regression trees. CRC press. Cristianini, N., Shawe-Taylor, J., 2000. An introduction to support vector machines and other kernelbased learning methods. Cambridge university press. DeJongh, J., Verhaar, H.J.M., Hermens, J.L.M., 1997. A quantitative property-property relationship (QPPR) approach to estimate in vitro tissue blood partition coefficients of organic chemicals in rats and humans. Archives of Toxicology 72, 17-25. 30

ACCEPTED MANUSCRIPT

Dimitriou-Christidis, P., Autenrieth, R.L., Abraham, M.H., 2008. Quantitative structure-activity relationships for kinetic parameters of polycyclic aromatic hydrocarbon biotransformation. Environmental Toxicology and Chemistry 27, 1496-1504.

RI PT

Dudek, A.Z., Arodz, T., Gálvez, J., 2006. Computational methods in developing quantitative structureactivity relationships (QSAR): A review. Combinatorial Chemistry and High Throughput Screening 9, 213-228. ECHA, 2008. Guidance on Information Requirements and Chemical Safety Assessment Chapter R.6: QSARs and Grouping of Chemicals.

SC

Eriksson, L., Antti, H., Holmes, E., Johansson, E., Lundstedt, T., Shockcor, J., Wold, S., 2003. Partial least squares (PLS) in cheminformatics. Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes, 1134-1166.

M AN U

Galliani, G., Rindone, B., Dagnino, G., Salmona, M., 1984. Structure reactivity relationships in the microsomal oxidation of tertiary amines. European Journal of Drug Metabolism and Pharmacokinetics 9, 289-293. Gao, C., Goind, R., Tabak, H.H., 1992. Application of the group contribution method for predicting the toxicity of organic chemicals. Environmental Toxicology and Chemistry 11, 631-636. Garson, G.D., 1991. Interpreting neural-network connection weights. AI Expert 6, 47-51. Goh, A.T.C., 1995. Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering 9, 143-151.

TE D

Hansch, C., 1969. Quantitative approach to biochemical structure-activity relationships. Accounts of Chemical Research 2, 232-239. Hayes, A.W., 2007. Principles and Methods of Toxicology, Fifth Edition. CRC Press.

EP

Hetrick, D.M., Jarabek, A.M., Travis, C.C., 1991. Sensitivity analysis for physiologically based pharmacokinetic models. Journal of Pharmacokinetics and Biopharmaceutics 19, 1-20.

AC C

Judson, R.S., Houck, K.A., Kavlock, R.J., Knudsen, T.B., Martin, M.T., Mortensen, H.M., Reif, D.M., Rotroff, D.M., Shah, I., Richard, A.M., Dix, D.J., 2010. In vitro screening of environmental chemicals for targeted testing prioritization: The toxcast project. Environmental Health Perspectives 118, 485-492. Judson, R.S., Kavlock, R.J., Setzer, R.W., Cohen Hubal, E.A., Martin, M.T., Knudsen, T.B., Houck, K.A., Thomas, R.S., Wetmore, B.A., Dix, D.J., 2011. Estimating toxicity-related biological pathway altering doses for high-throughput chemical risk assessment. Chemical Research in Toxicology 24, 451462. Knaak, J.B., Dary, C.C., Power, F., Thompson, C.B., Blancato, J.N., 2004. Physicochemical and biological data for the development of predictive organophosphorus pesticide QSARs and PBPK/PD models for human risk assessment. Critical Reviews in Toxicology 34, 143-207. LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436-444.

31

ACCEPTED MANUSCRIPT

Lewis, D.F.V., Sams, C., Loizou, G.D., 2003. A quantitative structure-activity relationship analysis on a series of alkyl benzenes metabolized by human cytochrome P450 2E1. Journal of Biochemical and Molecular Toxicology 17, 47-52. Li, X., Peng, L., Hu, Y., Shao, J., Chi, T., 2016. Deep learning architecture for air quality predictions. Environmental Science and Pollution Research 23, 22408-22417.

RI PT

Moré, J., 1978. The Levenberg-Marquardt algorithm: Implementation and theory, in: G.A. Watson (Ed.), Numerical Analysis. Springer Berlin Heidelberg, pp. 105-116. ONS, 2016, Abraham Descriptor Prediction from SMILES, http://showme.physics.drexel.edu/onsc/models/AbrahamDescriptorsModel001.php, August 5, 2016

SC

Payne, M.P., Kenny, L.C., 2002. Comparison of models for the estimation of biological partition coefficients. Journal of toxicology and environmental health. Part A 65, 897-931.

M AN U

Pelekis, M., Krishnan, K., 2004. Magnitude and mechanistic determinants of the interspecies toxicokinetic uncertainty factor for organic chemicals. Regulatory Toxicology and Pharmacology 40, 264-271. Peyret, T., Krishnan, K., 2011. QSARs for PBPK modelling of environmental contaminants. SAR and QSAR in Environmental Research 22, 129-169. Peyret, T., Poulin, P., Krishnan, K., 2010. A unified algorithm for predicting partition coefficients for PBPK modeling of drugs and environmental chemicals. Toxicology and Applied Pharmacology 249, 197207.

TE D

Poulin, P., Krishnan, K., 1995. An algorithm for predicting tissue: Blood partition coefficients of organic chemicals from n-octanol: Water partition coefficient data. Journal of Toxicology and Environmental Health 46, 117-129. Poulin, P., Krishnan, K., 1995a. A biologically-based algorithm for predicting human tissue: Blood partition coefficients of organic chemicals. Human and Experimental Toxicology 14, 273-280.

EP

Price, K., Krishnan, K., 2011. An integrated QSAR-PBPK modelling approach for predicting the inhalation toxicokinetics of mixtures of volatile organic chemicals in the rat. SAR and QSAR in Environmental Research 22, 107-128.

AC C

Puzyn, T., Leszczynski, J., Cronin, M.T.D., 2010. Recent Advances in QSAR Studies. Springer Science+Business Media, New York. Raunio, H., 2011. In silico toxicology non-testing methods. Frontiers in Pharmacology JUN. Salem, H., Katz, S.A., 2005. Inhalation Toxicology, Second Edition. CRC Press. Schmitt, W., 2008. General approach for the calculation of tissue to plasma partition coefficients. Toxicology in Vitro 22, 457-467. Silipo, C., Hansch, C., 1975. Correlation analysis. Its application to the structure-activity relation of triazines inhibiting dihydrofolate reductase. Journal of the American Chemical Society 97, 6849-6861.

32

ACCEPTED MANUSCRIPT

Smith, H.J., Williams, H., 2005. Smith and Williams' Introduction to the Principles of Drug Design and Action, Fourth Edition. CRC Press. Sprunger, L.M., Gibbs, J., Acree Jr, W.E., Abraham, M.H., 2008. Correlation of human and animal air-toblood partition coefficients with a single linear free energy relationship model. QSAR and Combinatorial Science 27, 1130-1139.

RI PT

USEPA, ToxCast™ Data, http://www.epa.gov/ncct/toxcast/data.html, 26/4/2013, 2013

Ventura, C., Latino, D.A., Martins, F., 2013. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds. European journal of medicinal chemistry 70, 831-845.

SC

WHO, 2010. Characterization and application of Physiologically Based PharmacoKinetic models in risk assessment. ISBN 978 92 4 150090 6, IPCS Harmonization Project Document No. 9, Geneva, Switzerland.

M AN U

Wolcott, R.M., Neal, R.A., 1972. Effect of structure on the rate of the mixed function oxidase catalyzed metabolism of a series of parathion analogs. Toxicology and Applied Pharmacology 22, 676-683. Wolcott, R.M., Vaughn, W.K., Neal, R.A., 1972. Comparison of the mixed function oxidase-catalyzed metabolism of a series of dialkyl p-nitrophenyl phosphorothionates. Toxicology and Applied Pharmacology 22, 213-220. Wold, S., Esbensen, K., Geladi, P., 1987. Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2, 37-52.

TE D

Wold, S., Sjöström, M., Eriksson, L., 2001. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109-130. Zhang, H., 2004. A new nonlinear equation for the tissue/blood partition coefficients of neutral compounds. Journal of Pharmaceutical Sciences 93, 1595-1604.

EP

Zhang, H., 2005. A new approach for the tissue - Blood partition coefficients of neutral and ionized compounds. Journal of Chemical Information and Modeling 45, 121-127.

AC C

Zhang, H., Zhang, Y., 2006. Convenient nonlinear model for predicting the tissue/blood partition coefficients of seven human tissues of neutral, acidic, and basic structurally diverse compounds. Journal of Medicinal Chemistry 49, 5815-5829. Zupan, J., Gasteiger, J., 1999. Neural networks in chemistry and drug design.

33