Tefrahedron Vol. 45, No. 9, pp. 2665 to 2676, 1989 F’rinted in Great Britain.
ORGANIC
0
SYNTHESIS
PLANNING:
STRATEGIC
Luca BAUMER(#),
AN ALGORITHM
BOND FORMING
Giordano
CO4&4020/89 S3.00+ .OO 1989 Pergamon Press plc
FOR SELECTING
SEQUENCES
SALA, Guido SELL0 *.
(Received in UK 3 1 January 1989)
Introduction. This paper for OSP
is part of a series
presently
topological principles
analysis on
following
which
Previous as
Lilith
is
of
sometimes outputs
set of
our program Previous
systems
Lilith,
papers
perception,
bonds
which
a system
were
about
and the logical
bonds is based. Of course,
strategic
all of the
constitutes
the
in the next section we recall some of
problem;
intermediate
rules to select a precursor
on
in order to reduce the otherwise bond is to be made
an implicit a
consequence.
very limited
principle
set of
On the contrary, possible
order
'inside'
space of the solutions;
each synthetic
point
the strategic
solutions
in the
section
of
for each fragmentation
of fragmentations);
of the most suited
of the
the way from the simple starting enormous
'before' or 'after' a certain
grants a low number
to the identification
so we consider
the
passage more as an optimization
precursors
than as a method
to prune
tree.
Hendrickson, convergent
(or privileged)
bond formation
the synthetic
for ring
in the field (5) often use heuristic
convergence
step devoted
department.
of the strategic
on the
to the synthesis
a particular
already
(and the sorting
selection
to the target,
synthesis
our
and terminology.
works
that
at
target molecule
must work
a necessary
materials deciding
the
solution
the main concepts
(l-4) in which we describe
development
of a
procedures
retrosynthetical
target
under
whose
SYNGEN
retrosynthesis,
(7) is
currently
the
has already demonstrated
is the sequence: 2665
paramount
example
of a program
(8) that the optimized
synthesis
for path
L. BAUMER et al.
2666
1) starting
materials
2) intermediate 3) precursor
/
1
affixation
1
/ 2
neglecting
the
refunctionalization
necessary,
depending
/ cyclization
on target's
the two
cyclization
are
which
must be
bond
defined
the order
At
Furthermore,
Lilith premises Depending breaking
(or
ambiguity
to
walk
because
on the
Lilith's
and
construction
many
a cycle
is presently
the ideal
times
as
given to select
(9) that "bondsets may be further But how
of an acyclic is
affixation
order of cyclizations.
is this "ordered
the order of bond making
in a bondset"
is
compound.
taken care
to the
of;
refunctionalizations,
first fragmentation;
starting materials
if a requirement
is not fulfilled,
of
either one of the
for a new, analogous
speak of 'bond forming which are
(or making)
the reversal
processing.
so,
'bond
breaking'
path,
and
sequence'
of each
Lilith does not allow skeleton
reaction
"complexity" each fragment
and mutual
strategy
always
other.
However,
disconnections
implies
conversely
or of 'bond
'bond
a
no
on the way
retrosynthetical
forming'
is the well-known
principle
implies
of convergence.
is the one which makes the target out of two fragments
the smaller
approach,
a
the complexity
is defined as well
(1) as
difference, a simple
as their species,
the
It means of similar
better the solution.
function
that considers
substitution
In our
the number of
levels, stereochemistries
positions.
From the definition two atoms)
follows:
Distance the
The
is
are constructed".
defines
fragmentations
sequence',
best solution
convergent
constitutes
in fact he stated
refer only
target;
atoms
between
as
direction.
complexity,
in
which
or both, can become the target
arise,
basis of
that the
one, and
/ available
cleaving)
precursors
The
formation
(the bonds)
on the context, we
(antithetical)
2-3
and terminology
should
synthetical
steps
etc. will be evoked at a later stage.
we always
obtained,
iterating
and no theory or heuristics
only skeleton
small / simple
two synthons
from
in which
protections,
2
size.
The answer "convergency
this level
and
bonds whose
only for the subsequent
activations,
reaching
steps
aware of the problem:
chosen?
sufficient
/ intermediate
the affixation
was
by
or more
I;
1
/ target
interchangeable,
Hendrickson
bondset"
/ precursor
affixation
4) intermediate
However,
/ intermediate
/ cyclization
of atomic complexity,
it is the shortest
calculation farthest
points
retrosynthesis output of
solution,
in
contains
only
fragmentation;
turn,
of
is
the "centre"
the
a set
All
the
to
(between
as a set of atoms halfway
strategic
bonds
chosen
for
is a set of solutions;
each
atoms.
section, which we call MSSB,
subgraph
distance"
them.
of the molecule
structure.
of strategic
necessary
synthetic
of "complexity
path joining
span from these central
the strategic
the bonds the
identifies
the concept
complexity
bonds
split the representing
(a
subset of
target a
into
the MSSB).
Each solution
two parts for a particular
single solution
contains
only the
2667
Organic synthesis planning
intermediates
originating
into the double The
object
i.e. choosing possible
sequences. a single
alternative
sequences,
and
they all converge
is the sorting of the N bonds in every proposed
this paper,
the 'best' chronological
may coincide,
bond forming
leaf of precursors. of
made
by
from different
When exiting unbranched
i.e. the
sequence
this procedure,
first breakage(s)
representation
the synthetic
for each solution.
line
graph
Of course,
is reduced to a tree
some intermediate
can be comnon to some solutions;
of the final synthetic
INITIAL
graph, as shown
in FIG.
TREE
LILITH
1.
TREE
TREE FIG.
The bond;
procedure
this
applies
is equivalent
one or more the target
solutions
only to
which require the breaking
to say that the procedure
rings in their centre.
I
of more than one
applies only to substances
This follows from the forcing
into two (and no more than two) separate
points
this allows an
FINAL
C(XPLEl-E
MREmTEwFE
solution,
in which they must be formed among the N!
requirement
that contain which divides
parts.
Principles
No absolute process
of
sensitivity. coherence
rule applies
synthesis A
with
computer
stated
that
from them consider
the most can be
rules, though. section
ease of reaction,
main criterion
important
when deciding
MSSB;
to
relies upon We chose
efficiency,
atoms
the bond forming
these rules
experience
applying
principles.
logic
and and
We thus examine
and ring closures.
that is, adherence
are the
consequently,
in a not-computerized
the chemist's
some basic, general
steric hindrance
is strategic
part of
like many other points
it generally
needs
the strategic
bond centrality, The
to this subject;
planning,
to strategic
rules. We
central ones and that only bonds spanning bond centrality
sequence.
is also the first topic to
2668
L. BAUMERet al.
Another reaction hindrance part
important
can
evaluation.
selects
follows
point to consider
take place
reaction,
intramolecular
reactions
point of It
routine
that the
one: this
is
first bond
while all the following are easier;
view of
is therefore
are not heavily Mutual
an intermolecular
reason the
the consideration
intermolecular
from the
to
ease with which an intramolecular
divided
is related
reactions
in particular,
steric accessibility, sensible
is the
only one
made
bonds.
This
through
an
In general,
are intramolecular.
they have
less strict requirements
if they have sufficient
to choose to prepare
to steric
into two parts: the first
the first bond to be made, and the second sorts all the remaining
from
freedom.
compared
For this
is the different
conformational
first a bond whose reaction
sites
hindered.
topological
relationship
between
bonds
is the
third factor necessary
to the
task.
The algorithm
In
order to
fragmented
select
cutting
that all
the
the necessary
representation substituted processed
of
to
bond and
the molecular
saturate
again
the
by the
the identification
longest
existing we
points
in the
now
respectively
distinguish
between
extremes,
From which
structure
is
structure
So,
with
'similar' to
of the structures
minimal
MAXDi
with those of the target if this
bond is
product
which
is as
perform
the
maximum
strategic
subsequent
and convergence
(11).
target's
(This means
the
computer's
group is added or new
structure
joining
path is the
between molecular
MAXD)
and
is
distances
extremes
MAXDl, MAXDP...,
bonds 1, 2..., are not cleaved.
decreases
of these structures
extremes. the
Secondly,
Two will
no MAXDi
number of possible
paths
path can appear. strategic
(with respect
bonds, it follows
to the target);
that the bond
it means
that this
This should also be graphically also shows that the partially
which maximizes
the coincidence
evident cleaved of
the
(10). first,
similar as possible ring
bonds
the one
nearly completely
of complexity
shortest
rule, the extremes
the target.
constructed
efficiency,
(the
in FIG. 2, which
is also
This
atoms whose
atoms and
is the most central
the most
atom or
to the calculation
no new shorter
of central
No
nor with the target's
breaking
is
atoms.
(1) this distance
as a general
extremes
target
that constitute
modified).
devoted
MAXDO
the
separated
couples of
because
and, of course,
MAXDi
from a comparison
extremes
MAXDO,
are
where only the strategic
firstly,
our definition
minimizes
formed,
bonds of the current solution. matrices
the
We called
between
with each other's
less than
of
(3)
of the
in the structure
can be
structure
valence
molecule.
must be stressed:
not coincide
to be
connectivity
procedure
and to
MAXD;
first bond
all but one of the strategic
closures which
in
the first
synthetic
to the target
itself,
the desired
direction.
follows
directly
step
will arrive
at a
and it should be easier This is the principle
to of
from the joining of simplification
Organic synthesis planning FlG. 2 exemplifies
the results
members
couples,
of extreme
for a few structures.
2669
(Dots highlight
atoms that are
whose MAXD is reported.)
l F”
oon
c$ F9 l
l
0
0 Y
0
LHXD = 21
l
lwm
= 21
UN
l
l
FIG.
The simplification both structures produce
made on the
complete
there are three strategic
6 possible
solutions,
to the one shown in the first
synthetic
bonds whose
that the algorithm
tree is breakage,
2
evident without
in the
figure.
an ordering,
cuts down to 1. (The situation
half of FIG. 1, in the left part of the tree.)
In
would
is similar
2670
L.
Let us now consider
the
molecule
BAUMER
et
of.
of adrenosterone
to
explain
the
work done by
the
algorithm. The target
has 3 couples
(formed by 5 atoms) with MAXD equal to 25, the first solution
1 couple with MAXD equal to 34 and second
solution
target,
only 1 atom
the third solution
Strategic
efficiency
is
thus the
will be the last, the others basic criterion
intermolecular Congestions
giving the
starting
one)
of
is difficult
quantified
parameterization
we use
to order
structure
overridden
if
the
first
through
a
of G.A
MAXDi.
described
heuristic
(the site.
bond A-8, are classified
below. The total congestion
function,
and G.6 (12). If
reaction
at the reaction
which
is
essentially
the i-th bond has minimum
< (G.T(i) - G.T(j)),then
G.T(i)
MAXDi,
of bond an
additive
bond j replaces
a
bond i in the
sequence.
Another
possibility
two bonds
spanning
particular
reaction
cyclizations
three
A-B
but there is
of this kind of swap.
FIG.
to
which
the
reaction
the target) bonds a,
in double-affixation bond between
at the
database),
same time
than forming
contains no but
from
can often has a
a single,
experience
be performed
higher
reference it
is
to
known
any that
in a single synthetic
strategic
though central,
3
case (breakage of
explicit
efficiency
(closer
bond. So, if there are
b, c such that MAXDa < MAXDb and MAXDa < MAXDc,
with each other, while a is not, then a is replaced
but b and c are
by the lower-MAXD
b and c.
At this point, the last breakage the others
is the double-affixation
(1)). Lilith
a double-affixation
two bonds
strategic
must be checked
same atom
(no
involving
Forming
approach
most
the bond forming
resulting
atoms A and B, which must form the i-th strategic
FIG. 3 shows two examples
step.
the with
the bond with the largest
because of heavy steric congestion
bond such that (MAXDj - MAXDi)
priority
main criterion
MAXDi must be formed first,
(if any) in order of increasing
could be
as G.A and G.6 by the subroutine
j-th
the solution
the
with 2 of
has 3 couples with MAXD equal to 30 and 3 atoms coinciding
the bond with the smallest
This
is
coinciding
has
target,
to the target.
sequence: MAXDi
with one of the
has 3 couples with MAXD equal to 33 and 2 atoms
3 of the target. This last is evidently similar
coinciding
is entered.
is definitively
chosen,
and the procedure
which sorts
Organic synthesis planning purpose
Its
is
to obtain
cyclizations.
The
bond,
in the position
if any,
The adjacency have already structure,
bonds,
stated
either
containing as the
them
step
sequence
is to
place the
adjacent
to it.
which
bonds
in
maximizes
the
ease
double-affixation
that
an MSSB
containing
fused or bridged, pair of
both. But
defined
and
algorithm
we always
gives
than two
the
bonds implies
level. We
a polycyclic
at the center of the molecule.
strategic
described
more
of
with the best
of bonds which are part of the same ring is at a lower priority
for every
So, (SSSR),
first
a breakage
2671
in
precedence
bonds,
we could
refer only
to the
a previous
paper
to
the creation
find a
ring or envelope
Smallest (3).
Set
of rings
of Smallest
Rings
Using this SSSR for sorting
of the
smallest
and
'relevant'
rings. An example to double
FIG.
of a double
affixation
or
swap is shown in FIG.
4 and few examples of single swaps related
to the
of two bonds to the same ring are shown in
participation
5.
FIG.
I
L. BAUMER et al.
2672
Steric
congestion
A
correct
possible
only
far too
much
procedure
after a
times
must
be
of 3-dimensional
complete
computation
to
many
evaluation
evaluate
speed.
Approximation
conformational
strategic
This
is also
Steric
At
this
examination
level is
most general correct
to
substrate steric
speak of molecule
approach,
of a particular
and
no mechanism
evaluate
steric
be performed feature
and Initial-
a
and focus on a way to extrapolate
defined
We have to work on the
general applicability.
by
(15).
to use to form the bond under
can thus be simulated. of
(13, 14) affecting
state of that mechanism
the reaction
hindrance
'congestion',
graph.
reaction mechanism,
for the transition
target processing,
still unknown; situation
of simplification
from the 2-D molecular
(frequency factor)
of the
must
its principal
Wipke (16)
as "a
It is
property
in its ground state, and thus only one part of the total
effect
more of the called
hindrance".
However of the
we can consider
most
SN2 steric
common
approximation
bonds,
Lilith
(17) on leaving
groups
After
end of
reactions
using
experimental
(and
congestion.
thus on
congestion
(13 and
is a closer
to
structures
the 70's
correctness
and the lack of additivity data. The other
effects
structure)
approximation
which
when especially mechanics
data.
In fact, SN2
addition
to
double
in SN2 depend much more
than
on the entering
to steric
hindrance
and
than it
congestion
to
there is a sound basis of
focused more on different
DeTar and co-workers
conclusions. already
(e.g. 2-butyl,
of steric effects
extension
calculations. confirmed But
more
(25) has no practical
(22,23) They
re-examined
essential
the set
of useful
used
by
3-pentyl,
No
diisopropyl)
any quantitative
congested utility
Ingold.
aspects
revised
the
(24) prohibits
structures
since the first
(18, 19).
and eventually
structures
the largest attention
As a consequence,
in theoretical
of Ingold's
reactions
received
work (19) attention
series of
of intermediate
for
the most useful
and thus it must have a higher weight
interpretations.
and hypothesis,
the same
from known
of established
average
that the use of
will bring
others (e.g.
Steric
the target
related papers).
fundamental
molecular
data
quantitative)
examined
etc.)
data and theoretical Ingold's
until the
limited
at
carbonyl,
a weighted
reactions.
studies
experimental
small amount than many
is also one of the reactions
mechanistic
congestions
more crowded
of steric
groups
and thus
be for other
SN2
with the
state is
the role
the alkyl
reaction must contain
there is a good confidence
as main factor to calculate
substitution
when considering
'undefined'
In particular,
can reach
activated
tetragonal
that this
mechanisms.
requirements
pentagonal
would
is a property
entropy
the target,
requires
(IAIA) approach.
of real steric hindrance
hindrance
the activation
for
is
that needs a fast
since this calculation
proposed
Thus we had to neglect every J-dimensional an approximation
but such a calculation
coherent with the principles
/ Increasing-Accuracy
atom in a molecule
for use inside Lilith
In fact,
solution
at a particular
analysis,
time than is acceptable
gross hindrances.
for every
hindrance
(20) SN2
Ingold's
(and
almost
data is still extension
to
has been made, interpolation
than neo-pentyl,
sometimes
from the point of view of SN2
Organic synthesis planning
reactivity,
but
it would
be necessary
in order
2613
to have
a homogeneous
scale of general
applicability. We
could however
within
the desired
Let
C(i) be
neglecting The
approximation
zero-weighting
by
a way of getting from such a limited set of data an estimation level.
the corrected
congestion
expressed
find
connectivity
(phenyl
G.(i)
a discrete
of
i-th atom,
group excluded)
i.e. its
substitution
at the i-th atom, which bears A alpha-substituents classification
level
substituents.
of the connectivity
of the molecular
al... aA, is graph, around
the i-th node, of the type:
G.(i) = f (ff,@M, flT)
where
a=
C(i), flT =z(j=l,
A) (C(aj)), @4
= MAX (C(aj))
and G. is an integer
in the range
[O - 73. The calculus of the
examination
skeleton atom
of G. for atom (i) is done in the following
for each
(i),
considered); At this
is cut; 2) atoms in the alpha sphere
(1) are
atoms
considered);
atom
It means
An example adrenosterone
@M)
can make
(b), cY=
sphere
The returned
and@T
1, @T
and their atomic
clearer.
proposed
(only skeleton
(CZ) (only excluding
atoms
(1) are
is done and atom (i) is assigned
to
(a), a=
Let us consider
considers
3,/?T = 3,PM
(a), the number
species
the breakage
(PT) and the
(17, 26, 27).
FIG. 3. The first example
is
of two bonds.
= 2, the class G. is equal to 5 (cfr. FIG.
= 3, flM = 2, the class G. is equal to 3 (cfr. FIG. 6). (a), Q= = 2,@M
is greater
1,pT
= 2, /$I = 2, the class G. is equal to 2 (cfr. FIG.
= 2, the class G. is equal to 2 (cfr. FIG. 6). than the bond 2 congestion.
value G. is a rough classification
the first
two neighboring
are 34 possible
they are grouped
An example
values,
the number of alpha-substituents
take care also of the case of bridgeheads There
of atom (i) are counted
(PT) of alpha substituents, evaluated
is
bond object
of beta atoms on each alpha atom is evaluated.
the algorithm
2,flT
The bond 1 congestion
by
VM)
and the solution
atom (b), a=
generated
the alpha
of beta-substituents
For bond 2 we have: atom
6);
the number
1) the strategic
class.
For bond 1 we have: atom atom
3)
based on cx,@M
that we consider
distribution
6);
in
4) the maximum
point a choice,
its congestion
way:
skeleton
by their G. value
for each class
of the steric
spheres. Where
congestion
at the atoms,
this value is modified
to
of ring systems.
arrangements
for a central
into 7 major classes
is reported
required,
in FIG.
6.
(28).
tetravalent
carbon atom,
and
L. BAUMERet al.
2614
If the nucleophile branching
around
not as much Its
can
atom is also
as it is in skeleton-forming important
around the attacked
be evaluated
with the
reactions,
chain
(see refs. 114, 115 in 19) (even if
atom).
same criteria,
and thus
the calculus
is
to both of them.
Discussion
and conclusion
The actually
creation
of
selects
a very
reactivity In
a breakage sequence limited part
is an important
of the
part of a CADS program,
synthetic
tree
since
and is a guideline
it
for the
subprograms.
Lilith
insertion
it
is executed
of some chemical
strategic
at a
very early
heuristics,
stage of processing,
it is essentially
and in spite of the
topology-based,
like the previous
section.
This which
as chain branching
congestion
applied
is a complex molecule,
the attacking
coherence
could
ensures
eventually
the self-consistency prove
very
of the system, but neglects
important;
e.g.,
approximations
some factors
in
congestion
evaluation. Besides,
Lilith
type of
reactions
However,
the
structure The properties
next step
to
these
of Lilith
of the target
problems
having any definite
bonds, or on the interferences was
partially
process
is
foreseen
the calculation
(4) that are used to address
the approach
strategic
the reactions according
answer
at this point without
to create these
when
idea either
on the
that will arise. Lilith's
global
was first planned.
So, we solve selecting
arrives
needed
to reactivity
by reversing
bonds after the identification
will be
chosen, together
to the bond breaking
sequence.
of the
outstanding
electronic
reactivity. the more common procedure:
instead
of useful / known / feasible
with the necessary
group transforms
of
reactions,
/ additions,
Organic synthesis planning The
first
currently
interference
being
the reaction
tested,
block, which
The
reactivity
of hindrance
and interference in Lilith
generates
number
shown
in
is a future
analysis
foreseen
improvement,
to be applied
level.
is always from approximation
towards
a more and more exact
matrix,
the
the different INPUT of
activities the
identification
atomic
3) The STRATEGic
phase
strategic
unacceptable
of the program
target is
followed
of atomic species some
molecular
weights,
part finds the MSSBs
5) POLARITIES
identifies
the natural
polarities
6) REACTIVITY
is concerned
with the absolute
possible
of processing
with a time.
are reported.
by its
DECODING
and of bond orders, are
(creation
of
the
etc.).
identified:
rings
(SSSR),
etc.).
bond forming
conditions.
an
(1).
the optimal
evaluates
increasing
features
4) BOND ORDER determines
7) INTERFER
to process
7, this will allow the use of a widely recursive flow-chart,
PERCEPTION
aromaticity,
which are evaluated
a small number of solutions,
at every loop, but without
1) An alphanumeric adjacency
reaction
and is
and consequently
(the 'best' ones only) at any time.
FIG.
reliability
In the figure
2) In
sequence
(IAIA approach).
even smaller
higher
trend
strategy
As
the bond forming
plan.
The general knowledge
is three steps away from this subroutine
will be able to modify
A more exact 3-O analysis at a deeper
2675
interferences
sequence
(the present
algorithm).
of bonds.
bond reactivity. either with other bonds
in the target or with
L. BAUMERet al.
2676
8)
REAC.COND.
suggests
9) EVALUATION As evidenced
is a central
of the search
a value
to each decision made by the program.
exchange
by the results of the other blocks.
towards
his
teach cannot
to
be
still unfulfilled,
If
approximations.
block 5 is also complete
activities evolution
and will be
our more
one of the first reviews about CAOS (6), stated
for "an effort by chemists
imagination
are
and their a continuous
paper.
expectation
chemist's
information This allows
better suggestions.
of a future
expectations
We
block that assigns
far back as 1974, Bersohn, writing
repeatedly "the
conditions.
1, 2, 3, 4 and part of 9, are already working,
the object As
reaction
by the arrows most of the blocks
can be influenced
Blocks
optimal
dissected, and
effective
this
method
to enunciate
analyzed is
the
and
root
precise
made of
automatic".
CAOS
is a trial-and-error
rules" and for
expert
These systems'
one, the computers
we
do better.
wrote
microcomputer,
our
algorithm
in
Fortran 77
using the SVS Fortran
Acknowledaement acknowledged. Tecnofarmaci.
and implemented
it on
a Honeywell-Bull
X-20
compiler.
by Te nofarmaci S.p.a. Partial financial sup ort Two of us (L. Baumer and 6. ! ala) acknow $edge a postdoctoral '?ellZi;f~'$
References. 1
Baumer I gaumer
L., L.,
A., Tetrahedron.
sJ;n Physical
I lY83)
and
Theoretical
Chemistry",
'a! 1 G. of all the bonds of the solution
suppl.
28, 206, ed. R.B.King,
is calculated.
Solutions
I='= "'i6b:'s484 sot., Org.React.
d’ F
Binzet S Richard& T: Hi:___:
1, 37,
with a
(1978)
(Tartu), 300 (1986) (C.A. 108,
J. Org. Chem. Oarben P 52, zO74 (lgQ7) K. S., "fiechanism and TheaPy in Organic Chemistry",
Harper
&