CompanyProductsScienceSupportWhatsnew
[Product Releases]
Index
[Blog]

Most recent post

[News]

Can we trust docking results?
Sept 2010

IBM Systems and Technology Group releases a white paper with eHiTS and Cell
Oct 2008

EPA's ToxCastTM project will use SimBioSys' eHiTS as docking engine
Nov, 2007

[Events]

243rd ACS
Mar 25-29, 2012
San Diego, CA
see >> more

Index

SPIDeR

Structure Production with Interactive Design of Results


Introduction

SPIDeR, the structure generation module of the SPROUT Toolkit, aims to generate skeletons or molecular graphs that satisfy steric constraints.

CONSTRAINTS An example of the constraints

The steric constraints consist of a boundary, usually defined by the solvent accessible surface of a receptor site, and a set of target sites, these are small regions of space that model localised interactions between the growing ligand and the receptor site. See HIPPO for more details about the target sites.


The Templates

Skeletons are built by the stepwise joining of small molecular fragments, called templates. Templates are 3D molecular graphs where the edges of a graph represent chemical bonds and the vertices of a graph represent generalised atoms, i.e., they are defined by hybridisation state but not element type.

DEF_TEMPLATE Templates = 3D molecular graphs

Template joining operations include the fusing of templates, spiro joining templates and forming a new single bond between two templates.

JOINING Template Joining

In the latter case, a number of conformations are produced about the new bond. The symmetry of templates is taken into account and a number of template joining rules exist to increase the efficiency of the program and also to prevent the formation of unlikely substructures.

The template library consists of chain and ring templates. Acyclic templates of between 1 and 4 atoms are included in the library and they can be joined into larger fragments by forming new bonds between them. Thus, any chain structure of sp3 and sp2 atoms can be built. The ring templates are listed below. Some of them are represented by more than one conformation. In these cases, the number of conformations is displayed inside the ring. Click on the inline image if you want to see an external picture of the conformations.

3_4_5_RINGS 3, 4 and 5 membered rings

6_RING_1 6_RING_2 6_RING_3 6_RING_4 6-rings


The skeleton generation algorithm

The method of structure generation used in the first version of SPROUT has been published [2]. More recently, we have developed a new method for structure generation. This method is summarised here and will be described in more detail in the near future [3]. The program has been tested and is currently being used by several pharmaceutical companies.

  1. Templates are positioned at all of the targets sites prior to skeleton generation. These templates become partial skeletons. (see EleFAnT for more details)
  2. Partial skeletons are grown outwards from each of the target sites by the stepwise joining of templates.
  3. Partial skeletons originating from different target sites are connected by superimposing a template that is common to both partial skeletons.
  4. The partial skeletons are oriented by a geometrical docking method after each joining or connection operation.
  5. The generated structures undergo a series of tests to check if they satisfy the user defined parameters.

Problem space representation

The problem space for skeleton generation is represented by a number of trees, i.e. a forest of trees. Each tree in the forest is associated with either a single target site or a group of target sites. Each node (branch junction or leaf) of a tree represents a partial skeleton that satisfies the target sites covered by the tree. The roots of the trees represent the target sites.

FOREST Forest

Click on the figure above to see a schematic representation of the problem space.

Multiphase graph search

The trees of the forest are explored in tree pair connection phases. Each tree pair connection phase takes two trees as input and results in a single combined tree. Thus as the search progresses the number of trees used to represent the problem decreases until finally, when all of the search space has been explored, the results are represented by one tree, the solution tree. A tree pair connection phase consists of:

  1. selecting the two trees to connect;
  2. performing a Breadth First Search (BFS) on the first tree (the BF tree);
  3. performing a Depth First Search (DFS) on the second tree (the DF tree);
  4. replacing the two source trees by the combined tree of the connection phase.

In the BFS phase, the BF tree is grown by applying join operations in each node expansion until all the leaves span at least half the distance between the target sites of the BF and DF trees.

EXPAND Expansion

During the DFS phase the nodes in the DF tree are expanded. Following each node expansion, a connection is attempted between each expanded node in the DF tree and all the nodes of the BF tree that have a template common to the expanded node. The successful connections result in new nodes in the combined tree.

CONNECT Connection


Geometric docking

The skeletons resulting from the connection of partial skeletons are positioned to satisfy the geometric constraints of the target sites by a geometric docking process. A connected skeleton must be positioned so that: it covers all the target sites that were covered by the individual partial skeletons; none of its vertices violate the boundary; and its position is optimised relative to the remaining target sites. The algorithm that is applied is a Directional Least Squares Fit (DLSF). It is based on [1], but is extended to optimise the directions of bonds as well as the positions of the vertices. The algorithm also applies some wriggling (small rotations) and translations to attempt to reach the target sites that are still uncovered.

Both the connected and the expanded skeletons are optimised within the cavity to avoid violating the solvent accessible surface and to reach the closest possible position to the goal target site(s) without losing contact with the satisfied target sites. This positioning procedure consists of the same steps outlined above, but uses also additional conditions and parameters.

The speed of the docking process is very important as it is applied to each of the partial skeletons generated during the combinatorial search. The method is very fast (it processes hundreds of skeletons per second on an SG Indy-4000) because it is purely geometry based and does not perform any energy calculation. The distance calculation is accelerated by a precalculated quasi-cubic distance grid.


Parameters

There are a number parameters that can be used to limit the large diversity of structures that are possible solutions. These limits also help to reduce the combinatorial explosion. A brief description of some of these parameters is given below.

Vertex limit

The maximum number of heavy (non-hydrogen) atoms in a skeleton

Ring-3 limit

The maximum number of 3 membered rings in a skeleton

Ring-4 limit

The maximum number of 4 membered rings in a skeleton

Ring-5 limit

The maximum number of 5 membered rings in a skeleton

Ring-6 limit

The maximum number of 6 membered rings in a skeleton

Chain length

The maximum number of consecutive acyclic bonds in a skeleton

Rotatable bonds

The maximum number of rotatable bonds in a skeleton

Spiro joins

The maximum number of spiro joins in a skeleton

Fuse joins

The maximum number of fused bonds in a skeleton

Ring ratio

The minimum percentage of ring vertices required in a skeleton

Van der Waals energy cut-off

The maximum allowed intra-molecular VdW interaction energy (kJ/mol)

Strain energy cut-off

The maximum allowed conformational strain energy (kJ/mol)

Rotatable bond penalty

The energy penalty for each rotatable bond (added to the strain energy)

Accessible surface tolerance

The probe radius of the accessible surface for skeleton generation


User interactions during skeleton generation

The user can interrupt the search process and use graphical tools to browse through the search trees to monitor the process and he/she can interact with the search process itself. The search can be stopped after any node expansion and there are then many possibilities for guiding the search. For example:

  • The order in which the trees are processed can be altered.
  • To speed up the search a tree pair connection phase can be aborted before completion; the current BF and DF trees are deleted and processing resumed with the next tree connection to give a subset of the possible solutions.
  • Nodes can be pruned from any of the trees at any time. Pruning a node also results in the pruning of all its successor nodes.
  • The set of spacer templates currently in use can be altered at any time during the search by removing templates or adding new ones to the set.
  • The operations that can be performed on a node, or on several nodes, in a tree can be altered, e.g. preventing certain type (fusion/spiro/new bond) of joining of new templates to partial skeletons.

At one extreme, a specific skeleton can be interactively built by manually specifying individual templates and joining operations. In practice, interaction is more useful at a higher level.


Test results

A test run was performed for generating ligands that can bind to the APPA binding site of Trypsin. Five target sites (generated by HIPPO) were used: two compound donor sites where the amidine group nitrogen atoms of APPA are bound, an acceptor and a dual (donor and acceptor) site at the carboxy group of APPA and a covalent site. The selected template set included chair cyclohexane, five and six membered aromatic rings and acyclic templates. The minimum ring ratio was set to 0.33, the number of vertices was limited to 20, the number of 5-membered rings was limited to 1, the number of 6-membered rings was limited to 2.

The CPU requirement of the run was 31 minutes and 18 seconds on a Silicon Graphics Indy 4000 machine. The program generated 20 solutions that satisfy all the target sites, steric and parametric constraints. The solutions are listed below. An external large colour image of each solution in protein environment with the target sites is available by clicking on the corresponding image# link (# is the number of the skeleton). The PDB# links lead to pdb files containing the skeleton together with the receptor site. They can be displayed by the "MIME hyperactive molecule" system.

     image1 PDB1         image2 PDB2        image3 PDB3
     image4 PDB4         image5 PDB5        image6 PDB6
     image7 PDB7         image8 PDB8        image9 PDB9
SOLUTIONS
    image10 PDB10       image11 PDB11       image12 PDB12
    image13 PDB13       image14 PDB14       image15 PDB15   
    image16 PDB16       image17 PDB17       image18 PDB18    
              image19 PDB19       image20 PDB20

Conclusion

The first version of SPROUT [2] used a different algorithm for structure generation that samples the problem space by choosing discrete but fixed orientations for the skeletons. Therefore it was limited to an arbitrary subset of the problem space, and was not able to find solutions for some (theoretically solvable) problems.

The version outlined here (SPROUT2) is more exhaustive as it explores structure space as a continuum. It is able to generate a large number of solutions for a diverse set of problems and is currently being used in a number of laboratories for structure-based drug design.

The version of SPROUT that is currently under development uses a more suitable target site representation derived from the output of HIPPO, so that the structures that are generated are restricted to those that are most likely to bind strongly to the receptor site.

Other future developments of the program will include improvements to the pharmacophore mode representation.


References

  1. D.R. Ferro, J. Hermans, Acta Cryst., A33 (1977) 345.
  2. V. Gillet, A.P. Johnson, P. Mata, S. Sike, P. Williams, J. Comput.-Aided Mol. Design, 7 (1993) 127.
  3. Z. Zsoldos, V.J. Gillet, A.P. Johnson. In Preparation.

You may want to




Copyright © 2011 SimBioSys Inc., All rights reserved.