Organic synthesis planning: An algorithm for selecting strategic bond forming sequences

Organic synthesis planning: An algorithm for selecting strategic bond forming sequences

Tefrahedron Vol. 45, No. 9, pp. 2665 to 2676, 1989 F’rinted in Great Britain. ORGANIC 0 SYNTHESIS PLANNING: STRATEGIC Luca BAUMER(#), AN ALGORI...

711KB Sizes 0 Downloads 11 Views

Tefrahedron Vol. 45, No. 9, pp. 2665 to 2676, 1989 F’rinted in Great Britain.

ORGANIC

0

SYNTHESIS

PLANNING:

STRATEGIC

Luca BAUMER(#),

AN ALGORITHM

BOND FORMING

Giordano

CO4&4020/89 S3.00+ .OO 1989 Pergamon Press plc

FOR SELECTING

SEQUENCES

SALA, Guido SELL0 *.

(Received in UK 3 1 January 1989)

Introduction. This paper for OSP

is part of a series

presently

topological principles

analysis on

following

which

Previous as

Lilith

is

of

sometimes outputs

set of

our program Previous

systems

Lilith,

papers

perception,

bonds

which

a system

were

about

and the logical

bonds is based. Of course,

strategic

all of the

constitutes

the

in the next section we recall some of

problem;

intermediate

rules to select a precursor

on

in order to reduce the otherwise bond is to be made

an implicit a

consequence.

very limited

principle

set of

On the contrary, possible

order

'inside'

space of the solutions;

each synthetic

point

the strategic

solutions

in the

section

of

for each fragmentation

of fragmentations);

of the most suited

of the

the way from the simple starting enormous

'before' or 'after' a certain

grants a low number

to the identification

so we consider

the

passage more as an optimization

precursors

than as a method

to prune

tree.

Hendrickson, convergent

(or privileged)

bond formation

the synthetic

for ring

in the field (5) often use heuristic

convergence

step devoted

department.

of the strategic

on the

to the synthesis

a particular

already

(and the sorting

selection

to the target,

synthesis

our

and terminology.

works

that

at

target molecule

must work

a necessary

materials deciding

the

solution

the main concepts

(l-4) in which we describe

development

of a

procedures

retrosynthetical

target

under

whose

SYNGEN

retrosynthesis,

(7) is

currently

the

has already demonstrated

is the sequence: 2665

paramount

example

of a program

(8) that the optimized

synthesis

for path

L. BAUMER et al.

2666

1) starting

materials

2) intermediate 3) precursor

/

1

affixation

1

/ 2

neglecting

the

refunctionalization

necessary,

depending

/ cyclization

on target's

the two

cyclization

are

which

must be

bond

defined

the order

At

Furthermore,

Lilith premises Depending breaking

(or

ambiguity

to

walk

because

on the

Lilith's

and

construction

many

a cycle

is presently

the ideal

times

as

given to select

(9) that "bondsets may be further But how

of an acyclic is

affixation

order of cyclizations.

is this "ordered

the order of bond making

in a bondset"

is

compound.

taken care

to the

of;

refunctionalizations,

first fragmentation;

starting materials

if a requirement

is not fulfilled,

of

either one of the

for a new, analogous

speak of 'bond forming which are

(or making)

the reversal

processing.

so,

'bond

breaking'

path,

and

sequence'

of each

Lilith does not allow skeleton

reaction

"complexity" each fragment

and mutual

strategy

always

other.

However,

disconnections

implies

conversely

or of 'bond

'bond

a

no

on the way

retrosynthetical

forming'

is the well-known

principle

implies

of convergence.

is the one which makes the target out of two fragments

the smaller

approach,

a

the complexity

is defined as well

(1) as

difference, a simple

as their species,

the

It means of similar

better the solution.

function

that considers

substitution

In our

the number of

levels, stereochemistries

positions.

From the definition two atoms)

follows:

Distance the

The

is

are constructed".

defines

fragmentations

sequence',

best solution

convergent

constitutes

in fact he stated

refer only

target;

atoms

between

as

direction.

complexity,

in

which

or both, can become the target

arise,

basis of

that the

one, and

/ available

cleaving)

precursors

The

formation

(the bonds)

on the context, we

(antithetical)

2-3

and terminology

should

synthetical

steps

etc. will be evoked at a later stage.

we always

obtained,

iterating

and no theory or heuristics

only skeleton

small / simple

two synthons

from

in which

protections,

2

size.

The answer "convergency

this level

and

bonds whose

only for the subsequent

activations,

reaching

steps

aware of the problem:

chosen?

sufficient

/ intermediate

the affixation

was

by

or more

I;

1

/ target

interchangeable,

Hendrickson

bondset"

/ precursor

affixation

4) intermediate

However,

/ intermediate

/ cyclization

of atomic complexity,

it is the shortest

calculation farthest

points

retrosynthesis output of

solution,

in

contains

only

fragmentation;

turn,

of

is

the "centre"

the

a set

All

the

to

(between

as a set of atoms halfway

strategic

bonds

chosen

for

is a set of solutions;

each

atoms.

section, which we call MSSB,

subgraph

distance"

them.

of the molecule

structure.

of strategic

necessary

synthetic

of "complexity

path joining

span from these central

the strategic

the bonds the

identifies

the concept

complexity

bonds

split the representing

(a

subset of

target a

into

the MSSB).

Each solution

two parts for a particular

single solution

contains

only the

2667

Organic synthesis planning

intermediates

originating

into the double The

object

i.e. choosing possible

sequences. a single

alternative

sequences,

and

they all converge

is the sorting of the N bonds in every proposed

this paper,

the 'best' chronological

may coincide,

bond forming

leaf of precursors. of

made

by

from different

When exiting unbranched

i.e. the

sequence

this procedure,

first breakage(s)

representation

the synthetic

for each solution.

line

graph

Of course,

is reduced to a tree

some intermediate

can be comnon to some solutions;

of the final synthetic

INITIAL

graph, as shown

in FIG.

TREE

LILITH

1.

TREE

TREE FIG.

The bond;

procedure

this

applies

is equivalent

one or more the target

solutions

only to

which require the breaking

to say that the procedure

rings in their centre.

I

of more than one

applies only to substances

This follows from the forcing

into two (and no more than two) separate

points

this allows an

FINAL

C(XPLEl-E

MREmTEwFE

solution,

in which they must be formed among the N!

requirement

that contain which divides

parts.

Principles

No absolute process

of

sensitivity. coherence

rule applies

synthesis A

with

computer

stated

that

from them consider

the most can be

rules, though. section

ease of reaction,

main criterion

important

when deciding

MSSB;

to

relies upon We chose

efficiency,

atoms

the bond forming

these rules

experience

applying

principles.

logic

and and

We thus examine

and ring closures.

that is, adherence

are the

consequently,

in a not-computerized

the chemist's

some basic, general

steric hindrance

is strategic

part of

like many other points

it generally

needs

the strategic

bond centrality, The

to this subject;

planning,

to strategic

rules. We

central ones and that only bonds spanning bond centrality

sequence.

is also the first topic to

2668

L. BAUMERet al.

Another reaction hindrance part

important

can

evaluation.

selects

follows

point to consider

take place

reaction,

intramolecular

reactions

point of It

routine

that the

one: this

is

first bond

while all the following are easier;

view of

is therefore

are not heavily Mutual

an intermolecular

reason the

the consideration

intermolecular

from the

to

ease with which an intramolecular

divided

is related

reactions

in particular,

steric accessibility, sensible

is the

only one

made

bonds.

This

through

an

In general,

are intramolecular.

they have

less strict requirements

if they have sufficient

to choose to prepare

to steric

into two parts: the first

the first bond to be made, and the second sorts all the remaining

from

freedom.

compared

For this

is the different

conformational

first a bond whose reaction

sites

hindered.

topological

relationship

between

bonds

is the

third factor necessary

to the

task.

The algorithm

In

order to

fragmented

select

cutting

that all

the

the necessary

representation substituted processed

of

to

bond and

the molecular

saturate

again

the

by the

the identification

longest

existing we

points

in the

now

respectively

distinguish

between

extremes,

From which

structure

is

structure

So,

with

'similar' to

of the structures

minimal

MAXDi

with those of the target if this

bond is

product

which

is as

perform

the

maximum

strategic

subsequent

and convergence

(11).

target's

(This means

the

computer's

group is added or new

structure

joining

path is the

between molecular

MAXD)

and

is

distances

extremes

MAXDl, MAXDP...,

bonds 1, 2..., are not cleaved.

decreases

of these structures

extremes. the

Secondly,

Two will

no MAXDi

number of possible

paths

path can appear. strategic

(with respect

bonds, it follows

to the target);

that the bond

it means

that this

This should also be graphically also shows that the partially

which maximizes

the coincidence

evident cleaved of

the

(10). first,

similar as possible ring

bonds

the one

nearly completely

of complexity

shortest

rule, the extremes

the target.

constructed

efficiency,

(the

in FIG. 2, which

is also

This

atoms whose

atoms and

is the most central

the most

atom or

to the calculation

no new shorter

of central

No

nor with the target's

breaking

is

atoms.

(1) this distance

as a general

extremes

target

that constitute

modified).

devoted

MAXDO

the

separated

couples of

because

and, of course,

MAXDi

from a comparison

extremes

MAXDO,

are

where only the strategic

firstly,

our definition

minimizes

formed,

bonds of the current solution. matrices

the

We called

between

with each other's

less than

of

(3)

of the

in the structure

can be

structure

valence

molecule.

must be stressed:

not coincide

to be

connectivity

procedure

and to

MAXD;

first bond

all but one of the strategic

closures which

in

the first

synthetic

to the target

itself,

the desired

direction.

follows

directly

step

will arrive

at a

and it should be easier This is the principle

to of

from the joining of simplification

Organic synthesis planning FlG. 2 exemplifies

the results

members

couples,

of extreme

for a few structures.

2669

(Dots highlight

atoms that are

whose MAXD is reported.)

l F”

oon

c$ F9 l

l

0

0 Y

0

LHXD = 21

l

lwm

= 21

UN

l

l

FIG.

The simplification both structures produce

made on the

complete

there are three strategic

6 possible

solutions,

to the one shown in the first

synthetic

bonds whose

that the algorithm

tree is breakage,

2

evident without

in the

figure.

an ordering,

cuts down to 1. (The situation

half of FIG. 1, in the left part of the tree.)

In

would

is similar

2670

L.

Let us now consider

the

molecule

BAUMER

et

of.

of adrenosterone

to

explain

the

work done by

the

algorithm. The target

has 3 couples

(formed by 5 atoms) with MAXD equal to 25, the first solution

1 couple with MAXD equal to 34 and second

solution

target,

only 1 atom

the third solution

Strategic

efficiency

is

thus the

will be the last, the others basic criterion

intermolecular Congestions

giving the

starting

one)

of

is difficult

quantified

parameterization

we use

to order

structure

overridden

if

the

first

through

a

of G.A

MAXDi.

described

heuristic

(the site.

bond A-8, are classified

below. The total congestion

function,

and G.6 (12). If

reaction

at the reaction

which

is

essentially

the i-th bond has minimum

< (G.T(i) - G.T(j)),then

G.T(i)

MAXDi,

of bond an

additive

bond j replaces

a

bond i in the

sequence.

Another

possibility

two bonds

spanning

particular

reaction

cyclizations

three

A-B

but there is

of this kind of swap.

FIG.

to

which

the

reaction

the target) bonds a,

in double-affixation bond between

at the

database),

same time

than forming

contains no but

from

can often has a

a single,

experience

be performed

higher

reference it

is

to

known

any that

in a single synthetic

strategic

though central,

3

case (breakage of

explicit

efficiency

(closer

bond. So, if there are

b, c such that MAXDa < MAXDb and MAXDa < MAXDc,

with each other, while a is not, then a is replaced

but b and c are

by the lower-MAXD

b and c.

At this point, the last breakage the others

is the double-affixation

(1)). Lilith

a double-affixation

two bonds

strategic

must be checked

same atom

(no

involving

Forming

approach

most

the bond forming

resulting

atoms A and B, which must form the i-th strategic

FIG. 3 shows two examples

step.

the with

the bond with the largest

because of heavy steric congestion

bond such that (MAXDj - MAXDi)

priority

main criterion

MAXDi must be formed first,

(if any) in order of increasing

could be

as G.A and G.6 by the subroutine

j-th

the solution

the

with 2 of

has 3 couples with MAXD equal to 30 and 3 atoms coinciding

the bond with the smallest

This

is

coinciding

has

target,

to the target.

sequence: MAXDi

with one of the

has 3 couples with MAXD equal to 33 and 2 atoms

3 of the target. This last is evidently similar

coinciding

is entered.

is definitively

chosen,

and the procedure

which sorts

Organic synthesis planning purpose

Its

is

to obtain

cyclizations.

The

bond,

in the position

if any,

The adjacency have already structure,

bonds,

stated

either

containing as the

them

step

sequence

is to

place the

adjacent

to it.

which

bonds

in

maximizes

the

ease

double-affixation

that

an MSSB

containing

fused or bridged, pair of

both. But

defined

and

algorithm

we always

gives

than two

the

bonds implies

level. We

a polycyclic

at the center of the molecule.

strategic

described

more

of

with the best

of bonds which are part of the same ring is at a lower priority

for every

So, (SSSR),

first

a breakage

2671

in

precedence

bonds,

we could

refer only

to the

a previous

paper

to

the creation

find a

ring or envelope

Smallest (3).

Set

of rings

of Smallest

Rings

Using this SSSR for sorting

of the

smallest

and

'relevant'

rings. An example to double

FIG.

of a double

affixation

or

swap is shown in FIG.

4 and few examples of single swaps related

to the

of two bonds to the same ring are shown in

participation

5.

FIG.

I

L. BAUMER et al.

2672

Steric

congestion

A

correct

possible

only

far too

much

procedure

after a

times

must

be

of 3-dimensional

complete

computation

to

many

evaluation

evaluate

speed.

Approximation

conformational

strategic

This

is also

Steric

At

this

examination

level is

most general correct

to

substrate steric

speak of molecule

approach,

of a particular

and

no mechanism

evaluate

steric

be performed feature

and Initial-

a

and focus on a way to extrapolate

defined

We have to work on the

general applicability.

by

(15).

to use to form the bond under

can thus be simulated. of

(13, 14) affecting

state of that mechanism

the reaction

hindrance

'congestion',

graph.

reaction mechanism,

for the transition

target processing,

still unknown; situation

of simplification

from the 2-D molecular

(frequency factor)

of the

must

its principal

Wipke (16)

as "a

It is

property

in its ground state, and thus only one part of the total

effect

more of the called

hindrance".

However of the

we can consider

most

SN2 steric

common

approximation

bonds,

Lilith

(17) on leaving

groups

After

end of

reactions

using

experimental

(and

congestion.

thus on

congestion

(13 and

is a closer

to

structures

the 70's

correctness

and the lack of additivity data. The other

effects

structure)

approximation

which

when especially mechanics

data.

In fact, SN2

addition

to

double

in SN2 depend much more

than

on the entering

to steric

hindrance

and

than it

congestion

to

there is a sound basis of

focused more on different

DeTar and co-workers

conclusions. already

(e.g. 2-butyl,

of steric effects

extension

calculations. confirmed But

more

(25) has no practical

(22,23) They

re-examined

essential

the set

of useful

used

by

3-pentyl,

No

diisopropyl)

any quantitative

congested utility

Ingold.

aspects

revised

the

(24) prohibits

structures

since the first

(18, 19).

and eventually

structures

the largest attention

As a consequence,

in theoretical

of Ingold's

reactions

received

work (19) attention

series of

of intermediate

for

the most useful

and thus it must have a higher weight

interpretations.

and hypothesis,

the same

from known

of established

average

that the use of

will bring

others (e.g.

Steric

the target

related papers).

fundamental

molecular

data

quantitative)

examined

etc.)

data and theoretical Ingold's

until the

limited

at

carbonyl,

a weighted

reactions.

studies

experimental

small amount than many

is also one of the reactions

mechanistic

congestions

more crowded

of steric

groups

and thus

be for other

SN2

with the

state is

the role

the alkyl

reaction must contain

there is a good confidence

as main factor to calculate

substitution

when considering

'undefined'

In particular,

can reach

activated

tetragonal

that this

mechanisms.

requirements

pentagonal

would

is a property

entropy

the target,

requires

(IAIA) approach.

of real steric hindrance

hindrance

the activation

for

is

that needs a fast

since this calculation

proposed

Thus we had to neglect every J-dimensional an approximation

but such a calculation

coherent with the principles

/ Increasing-Accuracy

atom in a molecule

for use inside Lilith

In fact,

solution

at a particular

analysis,

time than is acceptable

gross hindrances.

for every

hindrance

(20) SN2

Ingold's

(and

almost

data is still extension

to

has been made, interpolation

than neo-pentyl,

sometimes

from the point of view of SN2

Organic synthesis planning

reactivity,

but

it would

be necessary

in order

2613

to have

a homogeneous

scale of general

applicability. We

could however

within

the desired

Let

C(i) be

neglecting The

approximation

zero-weighting

by

a way of getting from such a limited set of data an estimation level.

the corrected

congestion

expressed

find

connectivity

(phenyl

G.(i)

a discrete

of

i-th atom,

group excluded)

i.e. its

substitution

at the i-th atom, which bears A alpha-substituents classification

level

substituents.

of the connectivity

of the molecular

al... aA, is graph, around

the i-th node, of the type:

G.(i) = f (ff,@M, flT)

where

a=

C(i), flT =z(j=l,

A) (C(aj)), @4

= MAX (C(aj))

and G. is an integer

in the range

[O - 73. The calculus of the

examination

skeleton atom

of G. for atom (i) is done in the following

for each

(i),

considered); At this

is cut; 2) atoms in the alpha sphere

(1) are

atoms

considered);

atom

It means

An example adrenosterone

@M)

can make

(b), cY=

sphere

The returned

and@T

1, @T

and their atomic

clearer.

proposed

(only skeleton

(CZ) (only excluding

atoms

(1) are

is done and atom (i) is assigned

to

(a), a=

Let us consider

considers

3,/?T = 3,PM

(a), the number

species

the breakage

(PT) and the

(17, 26, 27).

FIG. 3. The first example

is

of two bonds.

= 2, the class G. is equal to 5 (cfr. FIG.

= 3, flM = 2, the class G. is equal to 3 (cfr. FIG. 6). (a), Q= = 2,@M

is greater

1,pT

= 2, /$I = 2, the class G. is equal to 2 (cfr. FIG.

= 2, the class G. is equal to 2 (cfr. FIG. 6). than the bond 2 congestion.

value G. is a rough classification

the first

two neighboring

are 34 possible

they are grouped

An example

values,

the number of alpha-substituents

take care also of the case of bridgeheads There

of atom (i) are counted

(PT) of alpha substituents, evaluated

is

bond object

of beta atoms on each alpha atom is evaluated.

the algorithm

2,flT

The bond 1 congestion

by

VM)

and the solution

atom (b), a=

generated

the alpha

of beta-substituents

For bond 2 we have: atom

6);

the number

1) the strategic

class.

For bond 1 we have: atom atom

3)

based on cx,@M

that we consider

distribution

6);

in

4) the maximum

point a choice,

its congestion

way:

skeleton

by their G. value

for each class

of the steric

spheres. Where

congestion

at the atoms,

this value is modified

to

of ring systems.

arrangements

for a central

into 7 major classes

is reported

required,

in FIG.

6.

(28).

tetravalent

carbon atom,

and

L. BAUMERet al.

2614

If the nucleophile branching

around

not as much Its

can

atom is also

as it is in skeleton-forming important

around the attacked

be evaluated

with the

reactions,

chain

(see refs. 114, 115 in 19) (even if

atom).

same criteria,

and thus

the calculus

is

to both of them.

Discussion

and conclusion

The actually

creation

of

selects

a very

reactivity In

a breakage sequence limited part

is an important

of the

part of a CADS program,

synthetic

tree

since

and is a guideline

it

for the

subprograms.

Lilith

insertion

it

is executed

of some chemical

strategic

at a

very early

heuristics,

stage of processing,

it is essentially

and in spite of the

topology-based,

like the previous

section.

This which

as chain branching

congestion

applied

is a complex molecule,

the attacking

coherence

could

ensures

eventually

the self-consistency prove

very

of the system, but neglects

important;

e.g.,

approximations

some factors

in

congestion

evaluation. Besides,

Lilith

type of

reactions

However,

the

structure The properties

next step

to

these

of Lilith

of the target

problems

having any definite

bonds, or on the interferences was

partially

process

is

foreseen

the calculation

(4) that are used to address

the approach

strategic

the reactions according

answer

at this point without

to create these

when

idea either

on the

that will arise. Lilith's

global

was first planned.

So, we solve selecting

arrives

needed

to reactivity

by reversing

bonds after the identification

will be

chosen, together

to the bond breaking

sequence.

of the

outstanding

electronic

reactivity. the more common procedure:

instead

of useful / known / feasible

with the necessary

group transforms

of

reactions,

/ additions,

Organic synthesis planning The

first

currently

interference

being

the reaction

tested,

block, which

The

reactivity

of hindrance

and interference in Lilith

generates

number

shown

in

is a future

analysis

foreseen

improvement,

to be applied

level.

is always from approximation

towards

a more and more exact

matrix,

the

the different INPUT of

activities the

identification

atomic

3) The STRATEGic

phase

strategic

unacceptable

of the program

target is

followed

of atomic species some

molecular

weights,

part finds the MSSBs

5) POLARITIES

identifies

the natural

polarities

6) REACTIVITY

is concerned

with the absolute

possible

of processing

with a time.

are reported.

by its

DECODING

and of bond orders, are

(creation

of

the

etc.).

identified:

rings

(SSSR),

etc.).

bond forming

conditions.

an

(1).

the optimal

evaluates

increasing

features

4) BOND ORDER determines

7) INTERFER

to process

7, this will allow the use of a widely recursive flow-chart,

PERCEPTION

aromaticity,

which are evaluated

a small number of solutions,

at every loop, but without

1) An alphanumeric adjacency

reaction

and is

and consequently

(the 'best' ones only) at any time.

FIG.

reliability

In the figure

2) In

sequence

(IAIA approach).

even smaller

higher

trend

strategy

As

the bond forming

plan.

The general knowledge

is three steps away from this subroutine

will be able to modify

A more exact 3-O analysis at a deeper

2675

interferences

sequence

(the present

algorithm).

of bonds.

bond reactivity. either with other bonds

in the target or with

L. BAUMERet al.

2676

8)

REAC.COND.

suggests

9) EVALUATION As evidenced

is a central

of the search

a value

to each decision made by the program.

exchange

by the results of the other blocks.

towards

his

teach cannot

to

be

still unfulfilled,

If

approximations.

block 5 is also complete

activities evolution

and will be

our more

one of the first reviews about CAOS (6), stated

for "an effort by chemists

imagination

are

and their a continuous

paper.

expectation

chemist's

information This allows

better suggestions.

of a future

expectations

We

block that assigns

far back as 1974, Bersohn, writing

repeatedly "the

conditions.

1, 2, 3, 4 and part of 9, are already working,

the object As

reaction

by the arrows most of the blocks

can be influenced

Blocks

optimal

dissected, and

effective

this

method

to enunciate

analyzed is

the

and

root

precise

made of

automatic".

CAOS

is a trial-and-error

rules" and for

expert

These systems'

one, the computers

we

do better.

wrote

microcomputer,

our

algorithm

in

Fortran 77

using the SVS Fortran

Acknowledaement acknowledged. Tecnofarmaci.

and implemented

it on

a Honeywell-Bull

X-20

compiler.

by Te nofarmaci S.p.a. Partial financial sup ort Two of us (L. Baumer and 6. ! ala) acknow $edge a postdoctoral '?ellZi;f~'$

References. 1

Baumer I gaumer

L., L.,

A., Tetrahedron.

sJ;n Physical

I lY83)

and

Theoretical

Chemistry",

'a! 1 G. of all the bonds of the solution

suppl.

28, 206, ed. R.B.King,

is calculated.

Solutions

I='= "'i6b:'s484 sot., Org.React.

d’ F

Binzet S Richard& T: Hi:___:

1, 37,

with a

(1978)

(Tartu), 300 (1986) (C.A. 108,

J. Org. Chem. Oarben P 52, zO74 (lgQ7) K. S., "fiechanism and TheaPy in Organic Chemistry",

Harper

&