A neural network for the simulation of biological systems

A neural network for the simulation of biological systems

THEO CHEM Journal of Molecular Structure (Theochem) 398-399 (I 997) 565-57 I A neural network for the simulation of biological systems’ Jaroslaw P...

529KB Sizes 0 Downloads 67 Views

THEO CHEM Journal of Molecular

Structure (Theochem)

398-399

(I 997) 565-57 I

A neural network for the simulation of biological systems’ Jaroslaw Polanski* Imttitutr

of Chernist~.

University

of Silrsia

ul. Szkolna 9, PL-40006

Katowice,

Poland

Abstract It has been shown that a self-organizing neural network can be assembled with a single instar neuron to provide a method for the efficient analysis of biological data. The method properly models the affinities of the selected compounds to, the CBG, TBG (corticosteroid and testosterone binding globulins) and sweet taste receptor-sites. The influence of some parameters on the efficiency of the network has been analysed and alternative statistical analysis of the atomic coordinates and charges has been performed. 0 1997 Elsevier Science B.V. Keyvords:

Neural networks; Principal component

analysis; Receptor-sites;

Steroids

-

1. Introduction The formation of the receptor-hgand complex is a complicated phenomenon that can be analysed from many different points of view. A description of such a system demands an explanation of many effects. The mapping of the receptor-site, which enables one to understand the chemical nature of the interactions, is one of the most serious aspects of the problem. Although modern methods allow for the direct modelling of three-dimensional macromolecular receptor structures, it frequently occurs that no structural information on real receptors is available. In such a case ligands describe a pharmacophore indirectly defining the active site. Comparisons between these molecules substitute for the absolute description of the molecular effects. Many different parameters of the steric and electrostatic background are available for the characterization of molecular structures and many very different techniques are used for the comparison of * E-mail: Polanski@usctoux I .cto.us.edu.pl ’ Presented at WATOC ‘96, Jerusalem, Israel. 7- I2 July, 1996.

analogues within an analysed series [1,2]. Neural networks are one of the recent alternatives [3]. One of the most interesting neural applications described in chemistry is probably the two-dimensional map of the electrostatic potential of a molecular surface. Self-organizing neural networks, which were used by Gasteiger and co-workers [3,4] to obtain such maps, are unsupervisedly trained networks designed to reduce the dimensionality of the input objects while preserving their topology. Such maps allow for a visualization of the interactions of individual compounds with some biological receptors [3-51. This technique also forms a basis for the comparison between molecules. A reference molecule can be selected to form a template network that is trained with coordinates coming from the van der Waals surface. The coordinates of other molecules can be sent to such a network and surface vectors, e.g. the electrostatic potential, can be projected onto this network [6-81. The resulting comparative feature map is a kind of superimposition of the molecule and the template. The patterns can be compared by means of the simple subtraction of the matrices [7], or classified by

0 166. I280/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PII SO 166.1280(97)00079-I

566

J. Polanski/Journal

c?fMolecular

Structure

the use of the second neural layer [6]. The method can be simplified by the substitution of the molecular surfaces and electrostatic potential for the atomic coordinates and partial atomic charges [9,10]. The results are comparable with those yielded by larger maps [9]. The aim of this paper is a systematic analysis of the transformation performed by a complex network consisting of the small comparative self-organizing maps (SOMs) and a single unsupervisedly trained neuron.

2. Experimental 2.1. Model building All structures were modeled by HYPERCHEM 4.0. The MM+ force field was used to optimize the molecules and partial atomic charges were calculated with the AM1 method. 2.2. Neural procedures Neural procedures were programmed using the embedded protocols of the MATLAB 4.0 NN-Toolbox for Windows, as detailed in previous publications [9,10]. Principal component analysis (PCA) and other calculations were also carried out in the MATLAB environment. All programmed m-files performing the discussed procedures are available from the author on request.

3. Results and discussion 3.1. Steroid data The steroid data shown in Table 1 are extracted from the literature [ 11,121. These compounds are complexed by the corticosteroid (CBG) and testosterone (TBG) binding globulins which are carrier proteins controlling the level of the biologically active hormone. The rigid steroid skeleton forms an interesting object for molecular similarity analyses, therefore the series has been discussed in many previous publications [ 1 l- 141. It has been found recently, however, that most of the compilations of this data set contain errors [12]. Two different schemes can serve as models for the

(Throchem)

398-399

(1997)

565-571

CBG and TBG receptor-sites. The CBG affinity is controlled by shape factors and a sterically welldefined cavity can provide the basis for a description of such a receptor-space. Models including polar factors are more predictive for the TBG and a receptor capable of accommodating different ligand sizes better describes ligand-receptor interactions [IO]. 3.2. Neural network architecture Fig. l(a) gives the scheme of the neural architecture applied. The three-dimensional Cartesian coordinates of the most active analogue (template molecule) train the weight matrix, WT, of the self-organizing network consisting of the Kohonen [ 1.51or instar neurons [ 161. Individual atoms of the template are distributed between individual output neurons of the PT map. An additional connection omits the training step and transmits the partial atomic charges, ST, directly to the output neurons of this map. For each neuron an average partial atomic charge is calculated and used for labelling this neuron. This operation is called “colouring” of the map. The coordinates of each of the other molecules, M,, are then transformed by the WT matrix to the respective comparative SOM. “Colouring” with the respective partial atomic charges, S,, gives a series of comparative maps characterizing both the geometry of the molecules and the charge distribution. Fig. l(b) shows an example of the distribution of atoms within such maps for three molecules: the template and two analogues of different stereochemistry. In the next step the maps (charge matrices), P, are transformed to the respective vectors, p, and the Grossberg instar neuron [ 161 is taught to recognize the template pr vector. In fact, this is not this vector, but a group of vectors obtained by the addition of random noise to the replicated prototype that is presented to the instar during training. Such a set trains the N neuron unsupervisedly. A threshold logic (saturated linear transfer function) is used by the instar neuron, so the neuron’s output will yield a value of 1 recognizing the pr vector, if given at the input. Smaller values (in the range of 1-O) will appear for other inputs, p,, depending upon how closely they are related to pT. The procedure described includes two main steps. The first one allows for the transformation of molecules of different size into the charge matrices

J. Polarlski/JournctI

of Molecular

Structurc~ (Theoc~hrm) 39X-399

Table I Structure and CBCLTBG affinity data for steroid series of the SA-SE

SD

No.

I 2

3 4

5 6 7 8 9 IO

II 12 13 14

IS I6 I7 18 19 20 21 22 23 24 2s 26 27 28 29 30 31

Structure

SA SB SE SC SB SC SC SC SE SC SC SB SD SD SD SB SE SE SC SC SC SF SC SC SC SC’ SC SC SC’ SC SC

OH OH =o H =o =O =o OH =O -0 =o OH OH OH H OH OH =o =O =O =O =o -0 =o =o =O =o =o =O =o

H OH H OH OH OH =O =o H H OH OH =o OH COMe COMe H H H OH OH =O H H H H H OH OH

Hh H =O Hh COCH?OH COCH?OH COCH?OH COCH?OH COCHzOH Hh H H H’ H OH COMe COMe OH COCHzOH COCHzOCOMe COMe COCHIOH OH COMe COMe COMe COCH ?OH COCHzOH

SC

SE

X7

XI

structuresd

SB

SA

X5

H

OH

H H OH OH

=O

H OH H H OH H H

H OH H OH OH H H H OH H H OH OH

567

(1997) 565-571

OH

SF

xc>

x7

xx

x9

x,0

H H

H

H

H

H H

H H H

H H H

H H H

H H

H H

H H

H H

H H H

H H H

H H H

H H H

H

H H H H OH MC H H H

H H H H H H H Me Me

H H H H H H H H F

H

=o

OH H H H H H H

” Structures according to [ 121. CBG, corticoateroid binding globulin; TBG. testosterone h Of the 5-01 series. ’ Of the 5-p series. d Unknown. ’ H (hydrogen) instead of Me at C ,,) steroid skeleton.

binding globuhn.

CBG

TBG

(pK)

(PW

-6.279 -5.000 -5.000 -5.763 -5.613 -7.88 1 -7.881 -6.892 -s.ooo -7.653 -7.881 -5.919 -S.OOO -5.000 -5.000 -5.225 -S.225 -5.000 -7.380 -7.740 -6.724 -7.512 -7.553 -6.779 -7.200 -6.144 -6.247 -7.120 -6.817 -7.688 -5.797

-5.322 -9.1 I4 -9. I76 -7.462 -7.146 -6.342 -6.204 -6.43 I -7.819 -7.380 -7.204 -9.740 -8.833 -6.633 -8.176 -6.146 -7.146 -6.362 -6.944 -6.996 -9.204 ?” ?” ‘?” ,?” ?d ?d .?* ?” ?” *.?d

Affinity data according

to [ 141.

568

of Molrcular

.I. Polanski/Jourr~cd

/

//

Strtrcttcrr

(Theochrm)

(1997)

565-571

.-TEMPLATE

/

X, Y, L ccordlnates ,

/

Ml

I ST

398-399

4_, MZ

A

1 5-

training

I

I

1

coioring \ \

WT

‘\\ \\\ h ---__ i

I

I

I

‘+

1-b

PT

p2

p,

I I I I I I I I I I . formation

of

PT

t *

Pi ,

‘%recognition * . * .. . a *i id’ training

replicated

\

.*

,:

.’

--)

template

pT + noise

a

6 TEMPLATE

l!El 1

2

3

4

5

6

7

8

9

b

Fig. I. (a) The scheme of the architecture of the receptor-like neural network consisting of the comparative self-organizing map (SOM) layer yielding normalized P, charge matrixes and the recognition instar neuron (details in text). (b) An example of the distribution of the atoms within comparative SOMs for the template molecule 6 and two analogues: 1, with stereochemistry complying with 6, and 2, with different stereochemistry from that of the template molecule 6. The atoms found in the respective neurons of the SOMs are labelled with these numbers. Only non-hydrogen atoms are shown.

of Molrculrrr Structure (Theochern) 398-399 (1997) 565-571

.I. Polanski/Jourd

569

Table 2 Statistical

characterization

No.

Affinity”

of the predictivity

of the affinity

models obtained

from the neural network

Receptor

Training

Neighbourhood

type h

protocol’

size of the SOM networkd

I

and

Regression

model’

n

R

s 0.49

F-test

CBG

S

grid 3 by 3

31

0.89

TBG

IF

grid 3 by 3

30

0.84

0.67

TBG

IF

grid 3 by 3

30

0.92

0.48’

sweet

S

K

grid 3 by 3

9

0.95

0.12

CBG

s

K

grid 3 by 3

31

0.93

0.37

CBG

s

K

grid 4 by 4

31

0.87

0.47

92.33

CBG

S

K

grid 5 by 5

31

0.78

0.53

46.06

8

CBG

S

K

grid 6 by 6

Y

CBG

s

K

manhattan

10

CBG

S

K

II

CBG

S

K

5 by 5

198.27

31

0.57

0.5

I

14.00

2 by 2

31

0.86

0.48

83.96

manhattan

3 by 3

31

0.80

0.52

53.48

manhattan

4 by 4

31

0.79

0.53

48.67

I2

CBG

s

K

manhattan

13

CBG

S

K

toroid 4 by 4

I4

CBG

S

K

IS

CBG

S

K

I6

CBG

S

K

I7

CBG

S

K

18

CBG

S

19

CBG

S

20

CBG

21

31

0.67

0.55

23.73

31

0.60

0.55

26.93

toroid 5 by 5

31

0.75

0.55

36.53

toroid 6 by 6

31

0.76

0.54

39.0 I

toroid 7 by 7

31

0.67

0.55

23.02

chain 3

31

0.83

0.5 I

62.29

K

chain 5

31

0.67

0.55

23.63

K

chain 6

31

0.70

0.55

27.56

S

K

chain 7

31

0.82

0.52

59.92

CBG

S

K

chain 8

31

0.64

0.54

20.07

22

CBG

S

K

chain 9

31

0.78

0.54

45.07

23

CBG

S

K

chain IO

31

0.65

0.54

2 I .30

’ CBG and TBG for steroid structures,

sweetness for arylsulphonylalkanoic

acids: structures and activities

according

to

[ 17.181.

[IO].

h S. stiff: IF. with induced fit, detailed procedure

for the IF receptor type is given in

’ I, instar neurons: K. Kohonen

starts from an initial value of learning rate (Ir) = I and decreases during learning amounting

neurons. Learning

to Ir = 0.537(_5/(4 + i) + 0.863’1, where i is the number of cycles; the width of the neighbourhood maximal

(n) included

distance between neurons within the SOM (maxdiat) and decreases during learnmg: n = maxdist’”

a constant neighbourhood the minimum

function

and maximum

a = 0.5 was used. and training

in the training depends upon the

‘I’ (or n = I if the calculated

II < I ):

starts with the setting of each element of the weight matrix to the average of

values of the respective X, y and: rows of the input matrix: random noise added to the pT vectors amounts to 10% of

the I)~ vector length. d As defined in [9]; detailed description

’ For

the relationship

of the individual

observed vs. generated affinity:

neighbourhoods

models l-3

[ 191 and toroid in 131. [IO] and model 4 in [ 171; R,

can be found in

are discussed in detail in

correlation

coefficient;

s. standard deviation.

’ With

separated steric and polar effects.

consisting of the same number of elements, which enables us to make their comparison. In the second step a similarity to the SOM patterns is estimated by the assembled instar neuron. The geometry and charge distribution of the template object is the only information given to these networks during training. This means that the method can be used for predictive purposes. 3.3. Predictions The

steroid

series

was

used

to estimate

the

predictive power of the network. This power was characterized (Table 2) by the calculation of the correlation coefficients and standard deviations of the relationships between the observed affinities and signals generated by the network. (In practice, these signals were scaled linearly into the range of a respective affinity.) The method gives satisfactory predictions for the affinities of different receptors, i.e. CBG: models 1, 5; sweet-taste: model 4. Some modifications [lo] allow for modelling TBG: models 2,3. The network can also be used for finding an active pharmacophore conformation [ 181. The predictive

570

J. Polanski/Journal

of’ Molecular Structure

power of the optimal models outperforms those reported for steroids in the literature [ 121. A careful analysis of the performance of the network, measured by the correlations between predicted and observed values of the CBG affinities, shows that the Kohonen neurons applied in the SOM layer are more effective than the instar ones (model 5 vs. 1). The effect of the dimensionality, size and connectedness of the SOM units is illustrated in Table 2, entries 5-23. In general, smaller maps give better results. The toroid neighbourhood is the only exception here. In coefficient this particular case, the correlation increases to a size of 6 by 6 and then starts to decrease. One-dimensional or so-called chain networks (models 17-23) are slightly less effective than twodimensional ones. Fig. 2 illustrates the influence of the level of random noise added to the template Pr vector on the performance of the network. A ratio of the standard deviation of coefficients, R, and standard deviations, s, estimated for 100 random runs of the network to their mean values was used to characterize the reproducibility of the R and s parameters describing model 1 (Table 2). For the low noise levels (below 10% of the length of the pr vector) the relative deviations of the calculated parameters do not exceed about 1%. The higher the noise level the larger variation is observed.

(Theochem)

39X-399

(1997) 565-571

DR (x)and DS (o) 0.1

,o

7F

0.08.

0°0 0.06.

0 0

0.04.

0 0

0.02'

0

Y

20

40

60

80

100%

total noise Fig. 2. The relative standard deviation of the correlation coefficient, DR ( x ), and standard deviations, Ds (0). estimated for 100 random runs of the network and plotted against the total percentage noise added to the template vector pT, The data and network parameters are those shown in Table 2: model I.

molecule). It proves that the shape factors limit CBG. The TBG affinity gives the best correlation with the sdq parameter. These results compare well with those of the neural network indicating which variables limit the affinities, but the R and s values (Table 3) do not give a predictive power. Moreover, some outliers must be rejected for the best models.

4. Conclusions 3.4. Statistical analysis of atomic coordinates charges

and

We used a standard deviation and principal component analysis (PCA) to perform an analysis of the molecular data input to the network. Table 3 shows the correlation coefficients between the CBG (TBG) affinities and parameters calculated. The parameters are obviously differentiated into two groups, i.e. these correlating better with CBG provide a poor correlation with TBG and vice versa. It can be observed that the CBG affinity correlates well with sd.x(sdy,sdz)-the standard deviation of the xof,z) coordinates of a certain molecule-and pc(s&, sdy, sdz) ,, i.e. the first principal component calculated for sdu, sdy, sdz parameters submitted to PCA, and pc(sdr, sdy, sdz, sdq),, the first principal component calculated for sdx, sdy, sdz and sdq (sdq is the standard deviation of the partial atomic charges of a certain

A method for modelling biological affinities of different receptor-sites was presented. The method can be used for a prediction of the affinities of newly designed analogues. In general, smaller SOM networks give better models, and the Kohonen protocol is more efficient than that of the instar rule. Molecular data analysed by statistical procedures, including the PCA technique, provide the parameters which indicate the effects limiting the affinity.

Acknowledgements Financial support from the of Silesia and the Organizers ence, which made it possible the IV WATOC Conference fully acknowledged.

Rector of the University of the WATOC Conferto present this paper at in Jerusalem, is grate-

of Molecular

.I. Polunski/Journd

Table 3 Correlation

of CBG and TBG affinities with the parameters

No.

Parameter”

Regression

Structure

(Throchern)

obtained from the statistical

sdx sdy sdz pc(sdr,sdy,sdz) , pc(sdx.sdy,sdz) , sdq sdq pc(sdw,sdy,sdz,sdq), pc(sdw,sdy,sdz,sdq) , pc(sdu,sdy,sdz,sdq),

(1997)

571

5X-571

analysis of molecular

data

models h

TBG

I 2 3 4a 4b 5a 5b 6a 6b 6c

39X-399

CBG

R

5

R

s

0.46 0.38 0.58 0.45

I.10 I.15 1.02 I.11

0.7 I 0.85 0.44

0.88 0.67d I.1 I

0.87 0.87 0.77 0.89 0.93 0.62

0.54 0.55 0.73 0.51 9.6 x IO“’ 0.86

0.88 0.93 0.96

0.52 9.4 x lo-J’ 7.8 x IO-“’

As&, sdy, sdz = standard deviation of the individual x, y or z coordinates; sdq = standard deviation of the partial atomic charges: pc(s&,sdy,sdz), = first principal component matrix: pc(sd.r,sdy,sdz,sdq) , = first principal component calculated for the sdw. sdx, sdz, sdq matrix. h R = correlation coefficient, s = standard deviation. ’ Versus exp(CBG). d Without 5 and 16. ’ Versus exp(CBG), without 12.

References [I] K. Sen (Ed.). Molecular Similarities I. Topics in Current Chemistry. vol. 173, Springer. Berlin, 1995. [2] K. Sen (Ed.), Molecular Similarities II, Topics in Current Chemistry, vol. 173, Springer, Berlin, 1995. [3] J. Gasteiger, J. ‘&pan, Neural Networks for ChemistsAn Introduction. VCH, Weinheim, 1993, Chapter 19. [4] J. Gasteiger, X. Li, Ch. Rudolph, J. Sadowski. J. Zupan, J. Am. Chem. Sot. I 16 (1994) 4608. and references cited therein. [5] J. Gasteiger, X. Li, Angew. Chem. 106 (1994) 67 I. [6] J. Polanski. J. Gasteiger, The Comparison of Molecular Surfaces by an Assembly of Self Organizing Neural Network, Computers in Chemistry ‘94, Technical University of Wroclaw. National Institute of Standards and Technology, Gaithersburg, MD, USA; Wroclaw, 1994, p. 88. [7] T.W. Barlow, J. Mol. Graphics I3 (1995) 24. [8] S. Anzali, G. Bamickel, M. Krug, J. Sadowski, M. Wagener, J. Gasteiger, J. Polanski. J. Cornput.-Aided Mol. Design, in press. [9] J. Polanski, J. Chem. Inf. Comput. Sci. 36 (1996) 694.

calculated

for the sdx, sdy. sdz

[IO] J. Polanski, J. Chem. Inf. Comput. Sci.. submitted for publication. [I I] A.C. Good, S.J. Peterson, W.G. Richards. J. Med. Chem. 36 (1993) 2929. [ 121 M. Wagener, J. Sadowski, J. Gasteiger, J. Am. Chem. Sot. I 17 ( 1995) 7769. [I31 R.D. Cramer III, D.E. Patterson. J.D. Bunce. J. Am. Chem. Sot. I IO (1988) 5959. [I41 A.C. Good. S.S. So, W.G. Richards, J. Med. Chem. 36 (1993) 433. [ 151 T. Kohonen, Self-Organization and Associative Memory, 3rd edn., Springer, Berlin, 1989. [I61 S. Grossberg, Studies of the Mind and Brain, Reidel, Dordrecht, 1982. [I71 J. Polanski, A. Ratajczak, J. Gasteiger, Z. Galdecki, E. Galdecka, J. Mol. Struct, in press. [18] J. Polanski, A. Ratajczak. in: M. Matlouthi, J.A. Kanters, G.G. Birch (Eds.), Mini Ecro Symposium, Reims. France, July 1991, Elsevier, Barking, 1993, p. 185. [ 191 W.J. Melssen. J.R.M. Smits. L.M.C. Buydens, G. Kateman, Chemom. Intell. Lab. Syst. 23 (1994) 267.