De Novo Ligand Generation and Docking
Poster presented at the 36th Buffalo Medicinal Conference, May 1995.
An automatic, interactive computer system, called SPROUT, for de novo
structure based molecular design is currently under development in the
University of Leeds. The system consists of several modules addressing
different subproblems of structure based drug design: detection of
protein clefts, identification of potential interaction sites, primary
molecular structure generation, conversion of primary structures into
molecules and analysis of the solutions. This poster outlines the
primary structure generation method with the docking of the generated
molecular structures into a receptor site.
The structure generation is exhaustive within the bounds defined by the
constraints and uses novel systematic graph searching, hence does not
rely on random techniques. Therefore the best solution consistent with
the constraints is always guaranteed to be found.
The generated partial structures are docked into the receptor site in
every step of the search using a very fast (several hundred structures
per second), purely geometric rigid body docking process which focuses
on localised interaction sites, called target sites. These sites
represent regions in space where ligand atoms with constrained
directionality should be found. The docking method consists of several
algorithms including a binary search technique combined with least
squares fit, analytical and numerical optimisations.
3D chemical structure generation is a combinatorial problem. The
problem space is explored by graph searching techniques, that involves
heuristics for pruning the graph in order to reduce the combinatorial
explosion. The number of examined graph nodes is further reduced by
applying a novel bidirectional search technique.
Some functional groups are docked at the target sites prior to
structure generation. Then two group of target sites are selected and
structures are generated to connect them. The resulting set of
structures is used as starting point for another connection phase. The
pairwise connection phases are applied until a final set of structures
satisfy all the desired target sites.
In a connection phase the structures are grown from two opposite
directions at the same time and the halves are connected at the midle
of the cavity. A Breadth First Search (BFS) is applied from one side,
then a Depth First Search (DFS) is applied from the other side to
generate all the structures that can be connected to any of the
structures on the first side.

The figure above illustrates the part of the problem space that is
explored by the method (grey shaded) and also the saving (stripes)
compared to a single graph search. Let n denote the number of levels, s
the number of successors of a node. A single graph search would
examine sn nodes. The simultaneous search examines 2sn/2 nodes only!
E.g. if s=20 and n=6 then sn = 64,000,000 and 2sn/2 = 16,000.
The structures are represented by vertices, which are defined by
hybridisation (hence geometry) but not atom types, and bonds, which are
defined by bond type (single, double, aromatic, etc.). The partial
structures are grown by joining small fragments, called templates, to
the seed vertex of the existing structure (initially it is the docked
starting functional group, later a partial structure that already
consists of several templates). Three different join types are applied
in SPROUT:

A predefined discrete sampling of dihedral angles (representing low
energy conformations) is applied about each new bond join. The template
library consists of 3-6 membered rings and sp3 and sp2 atoms (for
building chains).
A connection is made between a pair of partial structures originating
from different target sites. The candidate partial structures must have
a template in common, which is overlapped during the connection. All
possible pairs are examined in turn. Each resulting structure is
positioned by the geometric docking method to satisfy all the target
sites of both partial structures and to avoid violating the steric
constraints of the receptor site. If such a position and orientation
does not exists then the structure is rejected. The figure below shows
an example for the connection:

The interaction sites are represented by 3D geometric regions. The
regions are calculated according to distance and angle tolerances for
an expected interaction to a certain receptor atom. For example, a
hydrogen bond acceptor region is generated for each hydrogen bond donor
of the receptor site to ensure the complementarity required for
molecular recognition. The minimum and maximum distance between the
acceptor atom and the hydrogen together with the minimum hydrogen bond
angle define the volume within which the acceptor ligand atom should
lie. Similarly, a geometric region is defined for ligand donors and
hydrogens to interact with acceptor atoms in the receptor site.
Geometric regions for metal ion interactions are also defined by bond
angle and distance tolerances. Examples of these geometric regions are
shown below:

Specific, strong
interactions are observed when a ligand atom forms multicentred or
bifurcated hydrogen bonds to the receptor site. SPROUT can represent 8
different compound hydrogen bonding situations by appropriate geometric
regions. One of these cases is a double interaction when an OH group
donates its proton to an acceptor atom of the receptor site while
accepting another hydrogen bond from a donor atom of the receptor. A
geometric region, representing the volume within which the OH oxygen
can be placed to provide this situation, is shown on the right.
The Least Squares Fit (LSF) technique is suitable for overlapping two
set of points in 3D. It can also be used to place some atoms within
given spheres by applying a weighting scheme which reflects the
differences in the radii of the spheres.
A hierarchical enclosing sphere system is defined for each target site
region. The outermost sphere encloses the whole region. The region is
cut into two halves by a plane perpendicular to the longest dimension
(see figure below). Two enclosing spheres are generated for the halves.
The procedure is repeated for both parts until the radius of the
enclosing sphere is smaller than the desired resolution value (e.g.
radius of 0.1A).

The positioning of the structure to satisfy the target sites is carried
out by iterative application of LSF to fit the covering vertices to the
centres of the spheres in the hierarchical representation. In the first
iteration, the outermost enclosing sphere is used for each target
region. Then a sphere is selected from the second level of each site,
which is closer to the actual position of the covering vertex. The
procedure is iterated searching down through the hierarchy until the
leaf nodes are reached. The number of iterations is equal to the number
of levels in the deepest hierarchy, i.e. the logarithm of the number of
smallest spheres.
The second phase of the docking resolves boundary violations and
orientates the mobile structures as close as possible to the goal
target sites. A structure is mobile if it is anchored to less than 3
target sites.
The method is based on numerical optimisation techniques applied to a
penalty function. The penalty function consists of three components:

St = distance between the covering vertex and target site t (zero if
satisfied).
Bv = distance of vertex v from the boundary surface if it
is outside, otherwise zero.
Gv = distance of vertex v from the goal
target site.
The distances are precalculated and stored in a grid. The shape of the
cavity is taken into account, i.e. the shortest route is calculated
within the available volume (avoiding boundary violations). The
precalculation uses a flood algorithm combined with the Dijkstra
algorithm to calculate 'quasi cubic' distances. Cubic distances are
measured along main axis directions only which can be a coarse
over-estimate of the Euclidean distance. The novel quasi cubic
distances are generated using a flood that also progresses in diagonal
directions but using different step increments for axis directions (10
units), plane diagonals (14 units) and 3D diagonals (17 units):

p-amidino-phenyl-pyruvate (APPA):
SPROUT was set up to generate molecular structures that have an amidino
and a carboxy group in the key interaction positions involved in APPA
binding. The bound state conformation (sp3) of the carbonyl group (at
O1), which provides a covalent bond to serine oxygen, was also
required. Using chair cyclohexane and benzene as the only ring spacers,
the program generated 7275 partial structures for this highly
constrained input in 2.5 minutes. 524 structures were successfully
docked and 9 final solutions were found. Some of the final solutions
are shown below:

Solution #2 has an equivalent 2D structure to the bound state of APPA.
The generated 3D conformation is shown (blue) in the following figure,
overlayed with the bound conformation of APPA (yellow). The receptor
atoms around the binding site are also shown.

The natural ligand, GDP (guanine diphosphate):
SPROUT was set up to
mimic the guanine and b-phosphate groups of GDP, because these groups
are known to interact strongly with the receptor. The program has
generated 525432 partial structures during the run (3 hours and 23
minutes), from which 50855 were successfully docked. 177 final
solutions were found that had atoms in appropriate positions and
orientations for the expected interactions. Some of these are shown
below:

Solution #117 (blue) is shown below in the context of the receptor
site, overlayed on GDP (yellow). The target site regions are
highlighted on the figure.

A fast de novo structure generation method coupled with a geometric
docking method has been developed and implemented. The generation is
based on graph searching and applies a combination of BFS and DFS
strategies. The search is exhaustive within the bounds defined by the
template set, conformational sampling and limiting parameters. This
feature is unique to SPROUT among the large number of de novo drug
design programs. Although, methods using random steps can give quickly
some promising solutions, there is no guarantee that they will find all
reasonable solutions, hence the optimal drug candidate might easily be
missed.
It was shown that the program is able to generate solutions that have
equivalent 2D structures and very similar 3D conformations to known
bound ligands. SPROUT can also generate a large variety of promising
novel structures.
Future work plans for the project include the ability to start the
search from known fragments and let the program extend the structure to
satisfy additional target sites, and also to provide better handling of
hydrophobic sites.
|