Modify
Atoms in
Results to
Achieve
Binding
Organic
Units
The structure generation phase, SPIDER, produces skeletons that contain no
information concerning atom type. The vertices of the skeletons are described by
hybridisation states alone and connections between vertices as bond types. Atoms
must be substituted onto these skeletons to (1) promote binding of the ligand to
the receptor, (2) to stabilise certain bonding situations or conformations, (3) to confer
certain physical properties such as transport properties and (4) to facilitate ease of
synthesis. This section describes the program MARABOU
[1] that addresses the
problem of atom substitution in order to confer the appropriate character at
binding sites, e.g. hydrogen bond donor, hydrogen bond acceptor, etc.
Properties are assigned to target sites prior to structure generation and are used
within this phase to ensure the appropriate heteroatoms are substituted for
binding. The following properties can be assigned to target sites; hydrogen bond
donor, hydrogen bond acceptor, either hydrogen bond donor or acceptor, positive
charge, negative charge or neutral. These properties are either manually assigned
to target sites or identified automatically using the HIPPO program.
Overview
The atom substitution program produces molecules by substituting combinations
of functional groups onto the molecular skeletons. The figure illustrates the use
of expert system technology to perform atom substitution. This provides a very
flexible approach to the development of the program since all information
concerning the functional group substitution is contained within a separate
knowledge base. This library can be readily updated without recourse to re-programming.
Each functional group substitution entry contains two parts. The
first part is a description of the appropriate site of the skeleton onto which the
heteroatoms of the functional group can be substituted. A linear string notation to
describe molecular substructures is used to define this region. This language is
based on the PATRAN language [2] developed for the LHASA synthesis design program and
is also quite similar to the SMILES [3] notation. The second part of the functional group
substitution entry is a rule describing the necessary atom and bond substitutions.
Rule
The above functional group substitution rule describes a three atom skeleton
substructure onto which the appropriate atom substitution is performed. 'X' is
used to denote a vertex, as the atom type of the skeleton is not defined at this
stage and '-' and '=' are used to represent single and double bonds respectively.
The atom features [HS=2] and [HS=3] define the number of attached hydrogens
(assuming the skeleton vertices are carbon). This ensures there are no
further connections on atoms 1 and 3 other than those defined. Properties
of a vertex within the pattern, i.e.
HACCEPTOR and NEGATIVE, must match the hydrogen bonding interaction or
electrostatic interaction properties assigned to the vertices of the skeleton.
To generate a series of molecules from a particular skeleton, a number of
operations are performed. Initially the skeleton structure is analysed and
information concerning the target site properties, the chemically significant rings,
the hydrogens attached to each atom (assuming each atom in the skeleton is a
carbon) and the aromatic atoms and bonds is established. This information is
generated to ensure correct mapping against the properties assigned to atoms and
bonds within the PATRAN statements. The next stage is a substructure search
where an attempt is made to match each PATRAN substructure statement against
the skeleton. This produces a list of identified substructures that are used to fire
the individual rules, resulting in a list of atom and bond substitutions
corresponding to each valid rule. All combinations of functional group
substitutions are generated and applied to the skeleton to produce a set of
molecules. Combinations of functional group substitutions whose atoms overlap
are discarded.
Example
This example illustrates the atom substitution phase in SPROUT. A skeleton
structure has been generated to satisfy the six target sites shown. The required
properties at each target site are presented. The program identifies a number of
functional groups that map onto the skeleton. The three structures produced are
generated through different combinations of functionality.
This section has described the atom substitution phase within the SPROUT
program for de novo molecular design. For a given target skeleton, a series of
molecules are produced containing the appropriate functionality to enhance the
binding characteristics of the molecule. The program uses a knowledge base of
46 functional group substitutions. This provides a flexible approach to the future
development of the program. Work is currently underway to allow further substitution
by heteroatoms onto structures to enhance the ease of synthesis.
SPROUT typically generates thousands of potential structures and hence methods
of prioritising sets of structures are essential to its practical use. The program
CAESA has been developed to rank large sets of structures according to an
estimate of synthetic ease. The program is an expert system and incorporates
knowledge concerning various aspects of the structural complexity of molecules. Additionally the
program is linked to a database of starting materials and a fast method of
automatically selecting precursors has been developed. This information allows
the program to identify structurally complex molecules that are easy to synthesise
because of the availability of suitable starting materials.