Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions

Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions

ARTICLE IN PRESS JID: PATREC [m5G;July 16, 2016;4:1] Pattern Recognition Letters 0 0 0 (2016) 1–9 Contents lists available at ScienceDirect Patte...

1MB Sizes 3 Downloads 76 Views

ARTICLE IN PRESS

JID: PATREC

[m5G;July 16, 2016;4:1]

Pattern Recognition Letters 0 0 0 (2016) 1–9

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensionsR Pierre-Anthony Grenier a,∗, Luc Brun a, Didier Villemin b a b

Normandie University, Caen, ENSICAEN, GREYC, France Normandie University, Caen, ENSICAEN, LCMT, France

a r t i c l e

i n f o

Article history: Available online xxx Keywords: Chemoinformatics Stereoisomerism Graph kernel

a b s t r a c t In chemoinformatics, Quantitative Structure Activity and Property Relationships (QSAR and QSPR) are two fields which aim to predict properties of molecules thanks to computational techniques. In these fields, graph kernels provide a powerful tool which allows to combine the natural encoding of molecules by graphs with usual statistical tools. However, some molecules may have a same graph but differ by the three dimensional orientation of their atoms in space. These molecules, called stereoisomers, may have different properties which cannot be correctly predicted using usual graph encodings. In a previous study we proposed to encode the stereoisomerism property of each atom by a local subgraph, called minimal stereo subgraph, and we designed a kernel based on the comparison of bags of such subgraphs. This kernel allows to predict properties induced by the stereoisomerism which cannot be correctly predicted using usual graph kernels. However, it has two major drawbacks : it considers each minimal stereo subgraph without taking into account its surroundings, and it considers that two non identical minimal stereo subgraphs have a null similarity. In this paper we present three extensions to tackle those drawbacks. The first extension allows to take into account interactions between minimal stereo subgraphs. The second extension allows to compare the neighborhood of minimal stereo subgraphs. And finally, the third extension provides a measure of similarity between different minimal stereo subgraphs. © 2016 Published by Elsevier B.V.

1. Introduction Prediction of molecular properties is based upon a similarity principle which states that: “Similar molecules should have similar properties”. Methods based on this principle involve the design of a model encoding molecules and a similarity measure between such models. The simplest model which can represent a molecule is its molecular formula (e.g. CH4 ). However, as this representation does not encode the bond connections between atoms, different molecules, called structural isomers, can have a same molecular formula. An usual way to overcome this limitation consists in using molecular graphs. A molecular graph is a simple graph G = (V, E, μ, ν ), where each node v ∈ V encodes an atom, each edge e ∈ E a bond between two atoms and the labeling functions μ and ν associate to each vertex and each edge a label encoding respectively the nature of the atom (carbon, oxygen, . . . ) and the type

R ∗

This paper has been recommended for acceptance by Cheng-Lin Liu. Corresponding author. Tel.: +33231452704. E-mail address: [email protected] (P.-A. Grenier).

of the bond (single, double, triple or aromatic). Molecular graphs, allow to encode neighborhood relationships between atoms, and thus allow to differentiate structural isomers. Graph kernels provide a measure of similarity between graphs. Under the assumption that a kernel k is symmetric and definite positive, the value k(G, G ), where G and G encode two graphs, corresponds to a scalar product between two vectors  (G) and  (G ) in an Hilbert space implicitly defined by k. This latter property allows to combine graph kernels with usual machine learning methods such as SVM. Graph kernels have been successfully used to predict molecular properties [5,12]. In a molecular graph, we can only obtain the list of neighbors of each atom. Thus molecular graphs also have a limitation: they do not encode the spatial configuration of neighbors of each atom. Some molecules, called stereoisomers, are associated to a same molecular graph but differ by the relative positioning of their atoms. As molecular graphs cannot distinguish those molecules, properties which vary between stereoisomers cannot be predicted by usual graph kernels. This can be problematic since some stereoisomers may have very different biological properties. Indeed, stereoisomers of some common drugs may be considered as violent poisons. For example, a molecule called thalidomide was

http://dx.doi.org/10.1016/j.patrec.2016.06.025 0167-8655/© 2016 Published by Elsevier B.V.

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

JID: PATREC 2

ARTICLE IN PRESS

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

Fig. 1. Two different spatial configurations of the neighbors of a carbon.

Fig. 2. Two different spatial configurations of two carbons linked by a double bond.

sold in the late fifties as an anti nausea for pregnant women. However, it turns out that one of the stereoisomer of this molecule could cause fetal malformation. Most of stereoisomers are characterized by the three dimensional orientation of the direct neighbors of a single atom or two connected atoms. We can imagine for example, a carbon atom, with four neighbors, each of them located on a summit of a tetrahedron. If we permute two of the atoms, we obtain a different spatial configuration (Fig. 1). An atom is called a stereocenter if a permutation of two atoms belonging to its neighborhood produces a different stereoisomer. We should stress here that, to a large extend, stereoisomerism is independent of a particular embedding of a molecule. Indeed, in Fig. 1, any particular embedding keeping the same relative positioning of atoms H, Cl, Br and F according to the central carbon atom C, would correspond to a same stereoisomer. In the same way, two connected atoms form a stereocenter if a permutation of the positions of two atoms belonging to the union of their neighborhoods produces a different stereoisomer (Fig. 2). According to chemical experts [10], within molecules currently used in chemistry, 98% of stereocenters correspond either to carbons with four neighbors, called asymmetric carbons (Fig. 1) or to couples of two carbons adjacent through a double bond (Fig. 2). We thus restrict the present paper to such cases. The 2% of other types of stereocenters correspond to heteroatoms (sulfur, phosphorus, ...) and metal in organometallic complexes. In [7] we have proposed to encode stereoisomers by ordered graphs. Intuitively, stereoisomerism property is related to the fact that permuting two neighbors of a stereocenter produces a different spatial configuration. If those two neighbors have a same label, the influence of the permutation should be searched beyond the direct neighborhood of this stereocenter. Based on this ascertainment, we have proposed in [7] to characterize a stereocenter by a subgraph, called minimal stereo subgraph, big enough to highlight the influence of each permutation of the neighbors of this stereocenter but sufficiently small to provide a local characterization of it. We then proposed a kernel based on those subgraphs. Another method which incorporates stereoisomerism within the graph kernel framework was proposed by [3]. This method consists in an extension of the tree-pattern kernel [12], where the similarity is deduced from the number of common tree-patterns between two molecules. There is two main differences between the kernels of [3] and [7]. Firstly, in [3], patterns which encode stereo information and patterns which do not, are combined without any weighting in the final kernel value. [7], takes only into account patterns which encode stereo information. Thus the similarity of two molecules without any stereo information will be null according to this kernel. This is expected since there is in this case no similarity related

to stereoisomerism between these two molecules. If a molecule property is related both to stereoisomerism and other structural information our kernel may be combined as a weighted sum with standards kernels. Note that [3] does not have this ability to weight the steroisomerism information. Secondly, the size of patterns is limited by a parameter in [3] and determined by the configuration around a stereocenter in [7]. Thus if a minimal stereo subgraph is smaller than a tree-pattern of [3], this pattern may encode the minimal stereo subgraph and its neighborhood. This point may be view as a drawback of [7] since identical minimal stereo subgraphs with different neighborhoods may have different influences on a property. Finally both methods have a common drawback, the similarity of different patterns is set to zero. However the principle of similarity may also be applied on patterns. Hence, we may consider that similar patterns have a similar influence on properties. In this paper we present three extensions of the kernel of [7] that overcome the limitations previously mentioned. We will recall the definition of [7] in Section 2. The first and second extensions, presented respectively in Sections 3 and 4, allow to take into account the neighborhood of minimal stereo subgraphs. The extension of Section 3 have a global approach to this problem: we take into account the interactions of a minimal stereo subgraph with the other minimal stereo subgraphs of a molecule. The extension of Section 4 have a more local approach, where we take into account the direct neighborhood of each minimal stereo subgraph. Finally, the aim of the third extension is to define a measure of similarity between different minimal stereo subgraphs (Section 5). 2. Ordered graphs and minimal stereo subgraphs 2.1. Encoding of molecules by ordered graphs The spatial configuration of the neighbors of each atom may be encoded through an ordering of its neighborhood. For example, considering the left part of Fig. 1, and looking at the central carbon from the hydrogen atom (H), the sequence of remaining neighbors of the carbon: Cl, Br and F may be considered as lying on a plane and are encountered clockwise. Thus, this spatial configuration is encoded by the sequence (H, Cl, Br, F) and the sequence (H, Br, Cl, F) encodes the second configuration. The configuration around a double bond can also be encoded by ordered sequences. Considering the left part of Fig. 2 and assuming a clockwise orientation with the plane embedding provided by this figure, we encounter H and Cl when turning around the carbon at the left of the molecule, and Br and H for the carbon at the right. Thus this configuration may be encoded by both sequences (H, Cl) and (Br, H) respectively for the left and right carbon atoms. Sequences (H, Cl) and (H, Br) encode the second configuration. In order to encode this information, we introduce the notion of ordered graph. An ordered graph G = (V, E, μ, ν, ord ) is a molecular graph Gm = (V, E, μ, ν ) together with a function ord: V → V∗ which maps each vertex to an ordered list of its neighbors. Two ordered graphs G and G are isomorphic (G  G ) if there exists an o

isomorphism f between their respective molecular graphs Gm and Gm such that ord ( f (v )) = ( f (v1 ) . . . f (vn )) with ord (v ) = (v1 . . . vn ) (where N (v ) = {v1 , . . . , vn } denotes the neighborhood of v). In this case f is called an ordered isomorphism between G and G . Note that the set of edges E is redundant, as the edges are implicitly given by the order function ord. We however keep the set E in order to stress the graph structure of ordered graphs. However, different ordered graphs may encode a same molecule. We thus have to define an equivalence relationship between ordered graphs, such that two ordered graphs are equivalent if they represent a same molecular configuration. To do so, we introduce the notion of re-ordering function σ , which associates to each vertex v ∈ V of degree n a permutation

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

3

Fig. 3. Two different views of a same molecule, giving two different orders. Fig. 4. An asymmetric carbon and its associated sequence (HCk )3k=1 .

σ (v ) on {1, . . . , n}, which allows to re-order its neighborhood. The graph with re-ordered neighborhoods σ (G) is obtained by mapping for each vertex v its order ord (v ) = v1 . . . vn onto the sequence vσ (v)(1) . . . vσ (v)(n) where σ (v ) is the permutation applied on v.

Permutations associated to a carbon with four neighbors correspond to all even permutations of its four neighbors. We can easily check that the different orders obtained by these permutations, encode a same configuration but either seen from a different neighbor or with a same view point but with a different encoding of the cyclic order of the three remaining neighbors. For example, even permutations (1, 4)(2, 3) and (2, 3)(3, 4) applied on the order (H, Cl, Br, F) of the central carbon of the molecule in the left part of Fig. 1 produce respectively the orders (F, Br, Cl, H) and (H, Br, F, Cl) which both encode the same configuration. The order (F, Br, Cl, H) corresponds to a vision of the central carbon of the molecule but this time seen from F (Fig. 3 (2)). So it represents the same configuration. For a double bond between two carbons, permutations associated to each carbon of the double bound must have a same parity. In the same way, we can check that these permutations correspond to different representations of a same configuration. The set of re-ordering functions, transforming an ordered graph into another one representing the same configuration is called a valid family of re-ordering functions  [8]. Using our restriction to carbon atoms and double bounds, an element σ belongs to  iff [8] σ (v ) is even for any v corresponding to a carbon atom with four neighbors and  (σ (v )) =  (σ (n= (v ))) for any v belonging to a double bound where  (.) encodes the parity of a permutation and n= (v ) encodes the other atom of the double bound incident to v. We say that it exists an equivalent ordered isomorphism f between G and G according to  if it exists σ ∈  such that f is an ordered isomorphism between σ (G) and G (σ (G )  G ). The equivo

alent order relationship defines an equivalence relationship [8] and two different stereoisomers are encoded by non equivalent ordered graphs. We denote by IsomEqOrd(G, G ) the set of equivalent ordered isomorphisms between G and G . Carbons with four neighbors, and double bonds between carbons, are not necessarily stereocenters. If they are not stereocenters, any permutation in their neighborhood would lead to an equivalent ordered graph. We thus define for an ordered graph G = (V, E, μ, ν, ord ) and one of its vertex v ∈ V a set of ordered isomorphism FGv , which are the isomorphisms between G and the ordered graphs obtained by permuting the neighbors of v:

FGv =

 (i, j )∈{1,...,|N (v )|}2

    f ∈ IsomEqOrd(G, τ v (G )) i, j f  with f (v ) = v

i= j

where τi,v j is a re-ordering function equals to the identity on all vertices except v for which it permutes the vertices of index i and j in ord (v ). We then define a stereo vertex as a vertex for which any permutation of two of its neighbors produces a non-equivalent ordered graph:

Definition 1 (Stereo vertex). Let G = (V, E, μ, ν, ord ) be an ordered graph. A vertex v ∈ V is called a stereo vertex iff FGv = ∅. Two carbons linked by a double bond form a stereocenter and we have proved in [8] that if a carbon of a double bond is a stereo vertex, then the other one is also a stereo vertex. Therefore we denote by kernel(s) the set of stereo vertices corresponding to a stereocenter (kernel (s ) = {s} if s is an asymmetric carbon and kernel (s ) = {s, n= (s )} if s is a carbon of a double bond, where n= (s ) denotes, as previously mentioned, the other carbon of the double bond). We further denote by StereoStar(s) the set composed of a stereocenter and its neighborhood: StereoStar (s ) = kernel (s ) ∪ N (kernel (s )). 2.2. Minimal stereo subgraphs Definition 1 is based on the whole graph G to test if a vertex

v is a stereo vertex. However, given a stereo vertex s, one can observe that on some configurations, the removal of some vertices far from s should not change its stereo property. In order to obtain a more local characterization of a stereo vertex, we should thus determine a vertex induced subgraph H of G, including s, large enough to characterize the stereo property of s, but sufficiently small to encode only the relevant information characterizing the stereo property of s. Such a subgraph is called a minimal stereo subgraph of s. We now present a constructive definition of a minimal stereo subgraph of a stereo vertex. Let s denotes a stereo vertex and let Hs be a subgraph of G containing kernel(s). We say that the stereo property of s is not captured by Hs if (Definition 1):

FHs s = ∅

(1)

To define a minimal stereo subgraph of s, we consider a finite sequence (Hsk )nk=1 of vertex induced subgraphs of G. The first element of this sequence Hs1 is the smallest vertex induced subgraph for which we can test (1): V (Hs1 ) = StereoStar (s ). If the current vertex induced subgraph Hsk does not capture the stereo property of s, we know by (1), that it exists some isomorphisms f ∈ F s k . We denote by E kf the set of vertices of Hsk inducing Hs

the isomorphism f in Hsk :

⎧  ⎫ ∃ p = (v0 , . . . , vq ) ∈ Hsk ⎪ ⎪  ⎨ ⎬  with v0 ∈ kernel (s ), E kf = v ∈ V (Hsk ) vq = v ⎪ ⎪ ⎩ ⎭  and f (v1 ) = v1

(2)

where p is a path. For example in Hs2 (Fig. 4), the two paths C − C − O are mapped one onto the other by an isomorphism f and thus belong to E 2f . In [8], we show that for any f in F s k , E kf is not empty. A verHs

tex v belongs to E kf if neither its label nor its neighborhood in Hsk

allow to differentiate it from f (v ). The basic idea of our algorithm consists in enforcing constraints on each v ∈ E kf at iteration k + 1

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC 4

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

by adding to Hsk the neighborhood of v in G. The set of vertices of the vertex induced subgraph Hsk+1 is thus defined by:

V (Hsk+1 ) = V (Hsk ) ∪



N (E kf )

(3)

f ∈F s k Hs

where N (E kf ) denote the neighborhood of E kf . The algorithm stops when the set F s k becomes empty. Algorithm 1 Construction of a minimal stereo subgraph. Input: a stereo vertex s and an ordered molecular graph G Output: a minimal stereo sub graph Hs1 ← {s} ∪ N (s ) (Fs1 , E 1f ) ← getIsomor phism(Hs1 ) k←1 while Fsk = ∅ do k←k+1 V (Hsk ) ← V (Hsk−1 ) ∪ N (E kf −1 )

Fi (S1 , S2 ) = k ⇒ ∀ j > k Fj (S1 , S2 ) = 0

The main steps of our method are summed up in Algorithm 1. The function getIsomorphism uses a fast isomorphism algorithm [2] to compute the isomorphisms f between Hsk and τi,s j (Hsk ) and

the sets E kf for each (i, j ) ∈ {1, . . . , |N (s )|}2 . Moreover, in order to improve execution times, isomorphisms Fsk−1 found during a previous iteration between Hsk−1 and τi,s j (Hsk−1 ) are used to initialize the isomorphism algorithm at step k. Finally, minimal stereo subgraphs correspond to a local characteristic and have consequently a limited size [7]. We proved in [8] that the subgraph obtained by this algorithm captures the stereo property of s. Fig. 4 illustrates our algorithm. Thus for each stereo vertex we can construct its minimal stereo subgraph to characterize it. We consider two stereo vertices as similar if they have a same minimal stereo subgraph, and to test it efficiently, we transform our minimal stereo subgraphs S into codes cS thanks to the method described in [17]. The stereo kernel [7] is defined by: (4)

H∈H (G )∩H (G )

where K denotes a kernel between real values, H (G ) the set of minimal stereo subgraphs of G and SH (G) the number of occurrences of the minimal stereo subgraph H in G. This kernel allows to encode the stereoisomerism property and unlike [3], does not use additional patterns not related to the stereoisomerism. However, as said in the introduction, this kernel have two main drawbacks : the minimal stereo subgraphs are considered independently and only identical minimal stereo subgraphs are considered as similar. 3. Graphs of interactions In this section we propose to encode interactions between minimal stereo subgraphs. To do so, we define some functions of interactions between minimal stereo subgraphs. We define different functions because we cannot know a priori how “close” two minimal stereo subgraphs have to be in order to interact [1]. Functions of interactions are defined according to a sequence of conditions (c1 , . . . , cn ). These conditions are increasingly constraining:

∀i ∈ {1, . . . , n − 1} ci+1 ⇒ ci

(5)

If a function of interactions Fi (S1 , S2 ) is equal to zero, it means that we consider that S1 does not interact with S2 . Note that :

end while

K (SH (G ), SH (G )).

c2 : kernel (s1 ) ⊂ S2 c4 : S1 ⊂ S2

Fi (S1 , S2 ) = max{ j ∈ {i, . . . , 4} ∪ {0} | c j }

(Fsk , E kf ) ← getIsomor phism(Hsk , Fsk−1 )

 c1 : S1 S2 = ∅ c3 : StereoStar (s1 ) ⊂ S2

We consider in this paper three functions of interactions Fi . Each function Fi is defined by only using conditions cj with j in {i, . . . , 4} ∪ {0}, where c0 is defined as ¬ci . The value Fi (S1 , S2 ) is obtained by taking the index j of conditions cj which represents the strongest interaction between S1 and S2 :

Hs

k(G, G ) =

Let S1 and S2 be two minimal stereo subgraphs of a same ordered graph, such that s1 is the stereo vertex of S1 and s2 is the stereo vertex of S2 . We propose the following set of conditions:

Thus a function Fj , with an index j > i, is more restrictive than Fi . Note that (Fi )i ∈ {1, 2, 3} are non symmetric functions. We define thanks to those functions, three graphs of interactions Gi where each vertex v ∈ Vi represents a minimal stereo subgraph and each edge encodes an interaction between two minimal stereo subgraphs deduced from Fi : Definition 2 (Graph of interactions). Let G = ( Gm = (V, E, μ, ν ), ord ) denotes an ordered graph, and H (G ) = {S1 , . . . , Sn } is set of minimal stereo subgraphs. A graph of interactions Gi = (Vi , Ei , μi , νi ) is a graph built from G and the function of interaction Fi . Each vertices uj of Vi correspond to a minimal stereo subgraphs Sj of G. Let uj and uk be two vertices of Vi and respectively Sj and Sk their corresponding minimal stereo subgraphs. There is an edge between uj and uk if one of the function of interaction Fi (Sj , Sk ) or Fi (Sk , Sj ) is not null. The labels of the graph of interactions are defined by : • •

∀ u j ∈ V i , μi ( u ) = c S j . ∀e = (u j , uk ) ∈ Ei , νi (e ) = min(F (S j , Sk ), F (Sk , S j ))  max (F(Sj , Sk ), F(Sk , Sj )).

where  denotes the concatenation and the minimal stereo subgraphs Sj and Sk are the subgraphs associated respectively to the vertices uj and uk of Gi and cS is the code describing S and defined in [17]. Fig. 5 shows the graphs of interactions obtained from an ordered graph using the different functions of interactions. The graph G1 is built by taking all conditions. However we may suppose that the weakest interaction c1 may not be relevant. Indeed, an intersection between two minimal stereo subgraphs may not be a sufficiently relevant information to suppose an interaction between the associated stereocenters. Thus the graph G2 is designed by considering that two stereo vertices are related if we have at least kernel(s1 ) ⊂ S2 or kernel(s2 ) ⊂ S1 . Moreover, a vertex s1 is a stereo vertex because of the relative positioning of its neighbor. So we may additionally suppose that, if a stereo vertex is present in a minimal stereo subgraph (kernel(s1 ) ⊂ S2 ), but not its neighborhood (StereoStar (s1 ) ⊂ S2 ), the stereo vertex may have a similar influence in S2 than a non-stereo vertex. Thus G3 is built by considering that two stereo vertices are related if we have at least StereoStar(s1 ) ⊂ S2 or StereoStar(s2 ) ⊂ S1 . We can check if a vertex is in a minimal stereo subgraph in constant time. Thus, the complexity to check each condition (Eq. (5)) considering two minimal stereo subgraphs H1 and H2 is O (max(|H1 |, |H2 | )) for c1 , O (|ker nel (s1 )| ) for c2 , O (|StereoStar (s1 )| ) for c3 and O (|H1 | ) for c4 . The worst case complexity for computing graphs of interactions is thus equal to

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

5

Fig. 6. A minimal stereo subgraph S with the vertex of its boundaries {v1 , v2 , v3 , v4 , v5 } and their 3-neighborhood.

Fig. 5. One ordered graph and its different graphs of interactions Gi , obtained using Fi with i ∈ {1, 2, 3}.

O (|H (G )|2 max |H | ). In practice this value is small (for the vitaH∈H (G )

min dataset presented in Section 6, we have at most |H (G )| = 9 and max |H | = 24). H∈H (G )

In practice, most of vertices have different labels in a graph of interaction. Hence, an edge is almost always fully defined by the labels of its two incident vertices. Therefore using directed graphs as graphs of interaction would not give much more information. Moreover there exist a lot of graph kernels (e.g. [5,11–13]) which can be used to measure the similarities of undirected graphs. Note that for the treelet kernel [5], treelets of size 1 correspond to the vertices of the graphs of interactions, and thus encode the same notion of similarity than the bags of minimal stereo subgraphs (Section 2.2).

that a k-neighborhood can be disconnected (Sv33 ) and that two kneighborhoods can have a non empty intersection (Sv34 and Sv35 ). We want to compare two minimal stereo subgraphs located in different graphs, such that there is an equivalent ordered isomorphism f between them. As they can have different surroundings, their k-neighborhoods are compared in order to have a local measure of similarity. However we do not compare all the pairs of k-neighborhood but only the ones associated to vertices u and v which can be matched by an equivalent ordered isomorphism. As we compare a subset of pairs of k-neighborhoods we define our kernel as a matching kernel [14]. For a minimal stereo subgraph S we denote by (v1 , . . . , vn ) an ordering Ol of δ in (S) with l ∈ {1, . . . , n!}. We also denote se(Sl ) the ordered sequence of the k-neighborhoods associated to the sequence Ol = (v1 , . . . , vn ):

se(Sl ) = (Svk1 , . . . , Svkn ) The mapping between two minimal stereo subgraphs S and S is defined as :



M 4. Neighborhood of minimal stereo subgraphs The extension proposed in this section have the same goal as the previous one: to consider minimal stereo subgraphs together with their surrounding. However in this section, we have a more local approach to tackle this problem. The idea is to compare the neighborhoods of vertices located on the boundaries of two identical minimal stereo subgraphs. For a stereo subgraph S, we denote δ in (S) the set of vertices on the boundaries of S:

δin (S ) = {v ∈ S | N (v ) ⊂ S} For each vertex v on the boundary of a minimal stereo subgraphs S we define a subgraph Svk called the k-neighborhood of v: Definition 3. (k-neighborhood) Let G = (V, E, μ, ν, ord ) be an ordered graph. We denote s a stereo vertex of G and S its minimal stereo subgraph. The kneighborhood of v, a vertex of δ in (S), is the induced subgraph Svk of G such that:

  d (u, v ) ≤ k VSvk = u ∈ G − S    ∀v ∈ δ (S ), d (u, v ) ≤ d (u, v ) 

in

Fig. 6 shows an example of k-neighborhoods associated to vertices of the boundary of a minimal stereo subgraph. We can notice

S,S

=



∃ f ∈ IsomEqOrd(S, S ) (se(Sl ), se(Sl ))



s.t ∀i ∈ {1, . . . , n}, f (vi ) = vi

(6)

where se(Sl ) = (Svk1 , . . . , Svkn ) and se(Sl ) = (Svk , . . . , Svk ). A couple n 1 of ordered sequences (se(Sl ), se(Sl )) is an element of the mapping MS,S if there is an isomorphism which maps those sequences. The kernel between those minimal stereo subgraphs is defined by:

kin f (S, S ) =



n

 δ (S, S ) kt (Svki , Svk ) i g(S )!g(S )! (se(S ),se(S ))∈MS,S i=1

(7)

where δ (S, S ) is equal to 1 if it exists an equivalent ordered isomorphism between S and S , kt is a kernel between graphs and g is a function which associates to a minimalstereo subgraph the size |δ in (S)| of its boundary. The division by g(S )!g(S )! removes the influence of the arbitrary ordering introduced by se(S). Proposition 1. The kernel kinf (S, S ) defined in Eq. (7) is definite positive. Proof. Proof can be found in A.1.



Finally, the kernel between ordered graphs is computed by comparing their set of minimal stereo subgraphs:

kin f G (G, G ) =



kin f (S, S )

(8)

S∈H (G ) S ∈H (G )

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC 6

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

a

Fig. 7. Example of a minimal stereo subgraph S with its induced subgraphs associated to the vertex of StereoStar∗ (s).

b

5. Similarity of minimal stereo subgraphs The similarity principle (Section 1) may also be applied recursively to patterns used to measure the similarity of molecules: we may suppose that similar patterns have a similar influence on a property. However, in the stereo kernel and in the two previous extension, we consider that two different minimal stereo subgraphs have a null similarity. Thus we present in this section a measure of similarity between minimal stereo subgraphs which do not give zero when those subgraphs are different. As those subgraphs are ordered graphs, we cannot use an usual measure of similarity between graphs. We have to consider the order defined on the neighborhood of the stereo vertices. To characterize a minimal stereo subgraph S associated to a stereo vertex s, we associate to each vertex vq of StereoStar∗ (s) an induced subgraph Hq of S. As this subgraph is an induced subgraph, we can define it by its set of vertices V(Hq ):

V (Hq ) = {v ∈ S − kernel (s ) | d (v, vq ) =

min

u∈StereoStar ∗ (s )

d ( v, u )}

Fig. 7 show an example of subgraphs associated to the neighbors of two carbons linked by a double bond. In order to compare two stereo vertices, we compare the subgraphs associated to their neighbors. However, those neighbors can be ordered in different but equivalent order. We thus define for a minimal stereo subgraph S, a set of re-ordering functions  S , used to create the measure of similarity between minimal stereo subgraphs. This set is composed of all the valid re-ordering functions which only modify the order around kernel(s):

 S = {σ ∈  | ∀v ∈ S − kernel (s ), σ (v ) = Id|N (v)| } where Idn is the identity on the set of permutations of n elements. Let S and S denote two minimal stereo subgraphs respectively associated to the stereo vertices s and s . If s and s encode asymmetric carbons we denote ord (s ) = (s1 , . . . , s4 ) and ord (s ) = (s1 , . . . , s4 ). If they represent one carbon of a double bond, we denote ord (s ) = (n= (s ), s1 , s2 ), ord (n= (s )) = (s, s3 , s4 ), ord (s ) = (n= (s ), s1 , s2 ) and ord (n= (s )) = (s , s3 , s4 ) (where n= (s ) is the other carbon of the double bond (Section 2)). We denote respectively Hi and Hi the subgraphs of S and S associated to si and si . Definition 4 (Kernel between minimal stereo subgraphs). For all re-ordering function σ of  S , we denote ϕ σ the permutation that σ performs on the vertices of StereoStar∗ (s). If s encodes an asymmetric carbon, then ϕ σ is equals to σ (s). The kernel between minimal stereo subgraphs is defined by:

k(S, S ) =



|StereoStar  ( s )|

σ ∈ S  σ  ∈ S

i=1

kt (Hϕσ (i ) , Hϕ  (i ) ) σ

where kt is an usual kernel between graphs.

(9)

Fig. 8. Two minimal stereo subgraphs with opposite order.

Let us consider the molecule of Fig. 8a where subgraphs H1 and H3 are similar. Let us further consider the same molecule with an opposite orientation in Fig. 8b. Since H1 = H3 and H3 = H1 , H1 is 4  similar to H1 and H3 to H3 . Hence the product kt (Hi , Hi ) has a i=1

high value. More generally, if a minimal stereo subgraph has two similar subgraphs, such a subgraph may be characterized as mostly not stero since it is similar to the stereo subgraph with an opposite orientation. Such a property may induce erroneous predictions since some molecular properties such as the optical rotation get an opposite sign for two opposite stereoisomers. We should thus decrease the value of the kernel if the two compared minimal stereo subgraphs are opposed. To do so, we use the following kernel:

kinter (S, S ) =



k(S, S ) k(S, S ) × k(S , S )

+ δ (S, S ) − δ (S, τ (S ))

(10)

where k is the kernel defined in Eq. (9), δ (S1 , S2 ) is equal to 1 if there is an equivalent ordered isomorphism between S1 and S2 and τ is a (non-valid) re-ordering function which permutes two neighbors of s (τ (S ) is thus the opposite of S ). Proposition 2. The kernel kinter (S, S ) defined in Eq. (10) is definite positive. Proof. Proof can be found in A.2.



For two opposed minimal stereo subgraphs, we have δ (S, τ (S )) equals to 1 and δ (S, S ) to 0. Thus for the example of Fig. 8, although H1 , H3 , H1 and H3 are similar, the value of the kernel is negative, and thus S and S are not considered as similar. If S and S are two minimal stereo subgraphs such that neither S nor τ (S ) are equals to S then the minimal stereo subgraphs are not opposed and (δ (S, S ) − δ (S, τ (S )) is equal to zero. As for the kernel of Section 4, the similarity of two ordered graphs is computed by applying the kernel between each of their minimal stereo subgraphs:

kinterG (G, G ) =



kinter (S, S )

(11)

S∈H (G ) S ∈H (G )

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

7

Table 2 Prediction of the biological activity of synthetic vitamin D derivatives. Method

RMSE

[12] [5] [3] Stereo Kernel

0.251 0.271 0.184 0.194

567-

(Section 3) with Graph of interactions 1 Graph of interactions 2 Graph of interactions 3

[5] 0.177 0.169 0.172

8910 -

(Section 3) with Graph of interactions 1 Graph of interactions 2 Graph of interactions 3

[12] 0.185 0.162 0.161

11 -

kinfG (Section 4)

0.177

1 2 3 4

-

The best value appear in bold, and the second best appear in italic.

Fig. 9. One molecule of the first dataset and its different graphs of interactions. Table 1 Classification of the ACE inhibitory activity of perindopirilates stereoisomers. Method

Accuracy

12-

[3] Stereo Kernel

96.875 87.5

345-

(Section 3) with Graph of interactions 1 Graph of interactions 2 Graph of interactions 3

[5] 93.75 93.75 84.375

678-

(Section 3) with Graph of interactions 1 Graph of interactions 2 Graph of interactions 3

[5] and MKL 100 87.5 90.625

The best value appear in bold, and the second best appear in italic.

erty. This explains why the adaptation of the tree-pattern kernel to stereoisomerism (line 1) and the two other graphs of interactions with treelets (lines 3 and 4) obtain a better accuracy. The kernel of [3] have better results than the treelet kernel on our graph of interactions. However, by using a multiple kernel learning algorithm [16], we can learn a weight for each treelet, that allow us to discard treelet not relevant for this classification and to obtain the best results with the first graph of interactions (line 6). As we can see in Fig. 9b, graphs of interactions G2 and G3 have very few edges. Having no edge in a graph is like not having a graph structure, and thus kernels applied on the graphs of interactions G2 and G3 are close to the stereo kernel. The second dataset is a dataset of synthetic vitamin D derivatives, used in [3]. This dataset is composed of 69 molecules, with an average of 8.55 stereocenters per molecule. This dataset is associated to a regression problem, which consists in predicting the biological activity of each molecule. To test the kernel which compares the neighborhoods (Section 4) we must choose the kernel kt between graphs used in Eq. (7). We have selected a “weight” kernel defined by: −(w−w )2 d

6. Experiments

kt (G, G ) = e

For all the experiments we use the same protocol: a nested cross-validation which selects parameters and estimates the performance. The outer cross-validation is a leave-one-out procedure, used to compute an error of prediction for each molecule of the dataset. For each fold, we use another leave-one-out procedure on the remaining molecules, to compute a validation error. We use standard SVM methods for classification and regression of molecules. Our first experiment is based on a dataset composed of all the stereoisomers of the perindoprilate [4]. As this molecule has 5 stereocenters, the dataset is composed of 25 = 32 molecules. In this dataset, we try to predict if a molecule inhibit the angiotensinconverting enzyme (ACE). Fig. 9 a shows one molecule of this dataset with its minimal stereo subgraphs and Fig. 9b shows the three graphs of interactions obtained from this graph. As all molecules of this dataset are stereoisomers, their graphs of interactions only differ by the labels of their vertices. Table 1 presents the accuracy obtained by the stereo kernel [7], the adaptation of the tree-pattern kernel to stereoisomerism [3] and by using the graphs of interactions. In this dataset two stereocenters (S5 and S3 in Fig. 9) have a same minimal stereo subgraph, but different surroundings. The stereo kernel (line 2) and the graph of interactions G3 (line 5), cannot differentiate those two stereocenters, which have a different influence on the prop-

where d is a parameter and w is the weight of a molecular graph (defined as the sum of the atomic mass of atoms encoded by the vertices of the graph). In practice, usual graph kernels [5,12] does not provide significantly better results than this kernel. Methods which do not encode stereoisomerism information [5,12] obtain poor results as we can see in Table 2 (lines 1 and 2). The adaptation of the tree pattern kernel to stereoisomerism [3] and the stereo kernel (lines 3 and 4) improve the results over the two previous methods hence showing the insight of adding stereoisomerism information. Taking into account relationships between minimal stereo subgraphs (lines 5 to 10) allows us to obtain better results than the stereo kernel (line 4). Unlike the previous dataset, edges information is relevant for G2 and G3 which have a higher degree (≈2). This last point explains the improvement over lines 3 and 4 observed on lines 6, 7, 9 and 10. Graph G1 have a high degree (4) and a high number of different labels on vertices, which induces a lot of unique patterns in each graph. This last point decreases the number of patterns common to two graphs and explains why on this dataset G1 does not obtain results as good as G2 and G3 . Taking into account the neighborhood of minimal stereo subgraphs (Section 4, line 11) allows to obtain better results than the stereo kernel. However those results are less good than the ones obtained by using graphs of interactions. Only a few molecules benefit from this extension. Indeed, for the majority of molecules the neighborhoods of identical minimal stereo

(12)

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC 8

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9 Table 4 Classification of the 67 pairs of stereoisomers.

123-

Method

Accuracy (%)

Rate of unclassified pairs (%)

[3] Stereo kernel kinterG with selection

80.6 58.2 88.1

4.5 38.8 0

The best value appear in bold, and the second best appear in italic.

Fig. 10. Example of a reductions which create two opposite stereoisomers. Table 3 Classification of the 50 pairs of stereoisomers.

123-

Method

Accuracy (%)

Rate of unclassified pairs (%)

[3] Stereo Kernel kinterG (Section 5)

90 66 92

0 30 0

The best value appear in bold, and the second best appear in italic.

subgraphs are similar and thus the kernel does not add a lot of information. We have tested the last extension (Section 5) on two datasets from [15]. For both datasets, we have some molecules reduced in two opposite stereoisomers as we can see in Fig. 10. However, for all these reductions, one of the two stereoisomers is present in greater quantity than the second. This stereoisomers is assigned to a class A and the other one to a class B. We have therefore two coupled classification problems, where each molecule have to be in the opposite class than its opposite stereoisomer. Unlike with previous datasets, we do not use a leave-one-out procedure but a leave-one-pair-out procedure. It means that for each fold of the nested-cross validation we withdraw a pair of stereoisomers. If we obtain the same class for both stereoisomers we consider the pair as unclassified. The first of those datasets is composed of 50 pairs of stereoisomers, where each of those stereoisomers have only one stereocenter. Results of the classification for this dataset are shown in Table 3. 15 pairs of stereoisomers have an unique minimal stereo subgraph in this dataset. Thus for those 15 pairs the stereo kernel assign the two stereoisomers to a same class. It explains why the stereo kernel have a poor accuracy on this datasets compared to the kernel of [3]. However, the extension presented in Section 5 allows to compare different minimal stereo subgraphs, and obtains better results. The other dataset contains 67 pairs of stereoisomers. Unlike the previous one, some molecules have more than one stereocenter. As only one minimal stereo subgraph is different between two molecules of a pair, some molecules have common minimal stereo subgraphs with their opposite stereoisomer. For the kernel between minimal stereo subgraphs (Section 5), we choose to only compare the minimal stereo subgraphs which differs between the two molecules of a pairs. Indeed, if two molecules have some identical minimal stereo subgraphs, those subgraphs cannot help to distinguish the two molecules. Accuracy and rate of unclassified pairs are reported for this dataset in Table 4. As for the previous dataset, the stereo kernel (line 2)

have a poor accuracy since it assigns a lot of pairs of molecules to the same class. Due to identical stereocenters in pair of molecules, the tree pattern kernel adapted to stereoisomerism (line 1) have also some unclassified pairs. By using a selection with our inter stereo kernel (line 3) we have a null rate of unclassified pairs. This method obtains the best accuracy. Moreover we can notice that its accuracy is higher than the accuracy plus the rate of unclassified pairs of the tree-pattern kernel adapted to stereoisomers. Thus we can suppose that a combination, which remains to be designed, of the tree-pattern of [3] with a selection of the different minimal subgraphs of a pair of molecules would still obtain a lower accuracy than the inter stereo kernel with selection. 7. Conclusion The stereo kernel have two main drawbacks: it considers each minimal stereo subgraph without taking into account its surroundings, and it considers that two non identical minimal stereo subgraphs have a null similarity. The three extensions presented in this paper allow to tackle those drawbacks. The first drawback is tackled by the two extensions presented in Sections 3 and 4. In both extensions we take into account the surroundings of the minimal stereo subgraphs but with different scales. In the first case (Section 3) we have a global approach where we study interactions between minimal stereo subgraphs. In the second case (Section 4) we have a more local approach where we compare the direct neighborhood of minimal stereo subgraphs. Finally, we propose in Section 5 a measure of similarity between minimal stereo subgraphs, which allows to compare different minimal stereo subgraphs. Experiments have shown that these extensions allow to obtain better results than the stereo kernel and the kernel of [3]. However we have tested all this extensions independently. It could be useful to integrate all those extensions in one kernel. However, having a non-null similarity between different minimal stereo subgraphs thanks to the extension of Section 5, implies that we can no longer count identical patterns in the graphs of interactions. Therefore, combining these extensions is not a direct task, and it will be the next step of this work. Acknowledgments This work has been performed using computing resources partially funded by the CPER Normandie. Appendix A We give here a sketch of the proof of definite positiveness of our kernels. More details can be found in [6]. A.1. Appendix 1 We want to prove that the kernel kinf , defined by Eq. (7), is definite positive. The first term of this equation δ (S, S ) is definite positive. It corresponds to a scalar product in a vector space where each component of a vector correspond to an ordered graph.

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025

ARTICLE IN PRESS

JID: PATREC

[m5G;July 16, 2016;4:1]

P.-A. Grenier et al. / Pattern Recognition Letters 000 (2016) 1–9

The second term √ 1

g ( S )!



1 g ( S  )!

is definite positive as the prod-

uct of two functions of S and S . The final term km (S, S ) =  n k k (se(S ),se(S ))∈M  i=1 kt (Svi , Sv ) is a mapping kernel [14]. S,S

i

We can prove that the mapping, defined in (6) and used in this kernel, is symmetric and transitive by symmetry and transitivity of the equivalent ordered isomorphism [7]. As the mapping is transitive, we have by [14], that the corresponding mapping kernel km (S, S ) is definite positive. Finally, as the product of definite positive kernels is definite positive, kinf is definite positive. A.2. Appendix 2 We want to prove that the kernel kinter , defined by Eq. (10), is definite positive. The kernel defined in Eq. (9) is a convolution kernel [9]. Thus the first part of kinter is definite positive. We have to prove that δ (S, S ) − δ (S, τ (S )) is definite positive. We consider the function  which associates to each minimal stereo subgraphs S a vector  (S). Each index  (S)c of this vector corresponds to a code c obtained by the method described in [17]. The components of this vector are defined by:

(S )c =

⎧ 1 ⎪ ⎪ √ ⎪ ⎨ 2

1

−√ ⎪ ⎪ ⎪ ⎩ 2 0

if c = cS if c = cτ (S ) otherwise

The scalar product between  (S) and  (S ) is δ (S, S ) − δ (S, τ (S )), therefore this term is definite positive. Finally, as the sum of definite positive kernel is definite positive, kinter is definite positive.

9

[2] V. Bonnici, R. Giugno, A. Pulvirenti, D. Shasha, A. Ferro, A subgraph isomorphism algorithm and its application to biochemical data, BMC Bioinform. 14 (Suppl. 7) (2013) S13. [3] J. Brown, T. Urata, T. Tamura, M.A. Arai, T. Kawabata, T. Akutsu, Compound analysis via graph kernels incorporating chirality, J. Bioinform. Comput. Biol. 8 (1) (2010) 63–81. [4] J.A. Castillo-Garit, Y. Marrero-Ponce, F. Torrens, R. Rotondo, Atom-based stochastic and non-stochastic 3d-chiral bilinear indices and their applications to central chirality codification, J. Mol. Graph. Modell. 26 (1) (2007) 32–47. [5] B. Gaüzère, Application des méthodes à noyaux sur graphes pour la prédiction des propriétés des molécules., Université de Caen, 2013 Ph.D. thesis. [6] P.-A. Grenier, Modélisation de la stéréochimie : une application la chémoinformatique, Université de Caen, 2015 Ph.D. thesis. [7] P.-A. Grenier, L. Brun, D. Villemin, A graph kernel incorporating molecule’s stereisomerism information, in: Proceedings of 22nd International Conference on Pattern Recognition (ICPR), 2014, pp. 631–636. [8] P.-A. Grenier, L. Brun, D. Villemin, Taking into account interaction between stereocenters in a graph kernel framework, Technical Report, CNRS UMR 6072 GREYC, 2014. https://hal.archives-ouvertes.fr/hal-01103318 [9] D. Haussler, Convolution kernels on discrete structures, Technical Report, Technical report, Department of Computer Science, University of California at Santa Cruz, 1999. [10] J. Jacques, A. Collet, S.H. Wilen, Enantiomers, Racemates, and Resolutions, Wiley, 1991. [11] H. Kashima, K. Tsuda, A. Inokuchi, Marginalized kernels between labeled graphs, in: ICML, vol. 3, 2003, pp. 321–328. [12] P. Mahé, J.-P. Vert, Graph kernels based on tree patterns for molecules, Mach. Learn. 75 (1) (2009) 3–35. [13] N. Shervashidze, P. Schweitzer, E.J. Van Leeuwen, K. Mehlhorn, K.M. Borgwardt, Weisfeiler-lehman graph kernels, J. Mach. Learn. Res. 12 (2011) 2539–2561. [14] K. Shin, T. Kuboyama, A generalization of Haussler’s convolution kernel: mapping kernel, in: Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 944–951. [15] J.-J. Suo, Q.-Y. Zhang, J.-Y. Li, Y.-M. Zhou, L. Xu, The derivation of a chiral substituent code for secondary alcohols and its application to the prediction of enantioselectivity, J. Mol. Graph. Modell. 43 (2013) 11–20. [16] M. Varma, B.R. Babu, More generality in efficient multiple kernel learning, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 1065–1072. [17] W.T. Wipke, T.M. Dyott, Stereochemically unique naming algorithm, J. Am. Chem. Soc. 96 (15) (1974) 4834–4842.

References [1] S. Bernstein, W.J. Kauzmann, E.S. Wallis, The relationship between optical rotatory power and constitution of the sterols, J. Org. Chem. 6 (2) (1941) 319–330.

Please cite this article as: P.-A. Grenier et al., Chemoinformatics and stereoisomerism: A stereo graph kernel together with three new extensions, Pattern Recognition Letters (2016), http://dx.doi.org/10.1016/j.patrec.2016.06.025