Archive for November, 2008

Collective Agreement and Designing a Product Roadmap for ARChem with our Users

Friday, November 14th, 2008

This week we held two design strategy meetings with Life Science companies from the East Coast. One was focused on a new de novo design package that we are working on while the other was for ARChem, our retrosynthetic analysis software platform (http://www.simbiosys.ca/archem/index.html). I’ll comment on the ARChem meeting I led while Zsolt can comment separately on the de novo design meeting he led.

When designing our software solutions we engage our users in providing feedback to us regarding needs, their biases in terms of scientific approaches and their thoughts about improving workflow, usability and algorithms etc. ARChem has been used in large pharma for about 3 years and our recent installations in new companies, and the resulting feedback from the users, has us focused on the next release cycle for the product. With this in mind we chose a different approach to gathering input.

We brought together scientists involved in the original design for ARChem (and therefore experienced users) as well as chemists who had recently trialed the system. We were interested to hear in a public forum, with issues regarding proprietary approaches put to one side, what would they like to see implemented in ARChem to satisfy their needs and move ARChem one step closer to being an ideal platform for chemists to perform computer assisted retrosynthetic analysis. By the end of the meeting we had rank ordered over 30 specific requests that came up in the meeting and the collective attendees had agreed to the primary issues to address for their needs. Some of the requests we had not even considered prior to this meeting and it was definitely one of the best uses of time and a great design session based on user needs…one to be repeated. We are off to work on the outcomes of the meeting and will keep you informed here of our progress.

posted by Aniko

SYNTHETIC COMPLEXITY AND ACCESSIBILITY: CHOOSING THE CORRECT TOOL TO THE TASK IN DE NOVO TRIAGE.

Monday, November 10th, 2008

There are many instances in which one wants to assess the synthetic accessibility of a set of compounds. It is crucial to choose the right tool for the task. If one has just 10-15 compounds to assess then using ARChem route designer would be the correct choice. ARChem gives one an assessment of retrosynthetic pathways from consideration of 10-million+ rules derived from `clustering’ extended reaction cores derived from the literature. Often, however, one wants to survey the synthetic accessibility of a large database of `virtual compounds’ ranging from with hundreds to hundreds of thousands of compounds.

This often arises in the contexts of de novo design in fragment based design paradigms. De novo design based on fragment docking and tethering entails: 1) ‘docking’ low molecular-weight/simple-topology fragments into the binding pockets of protein/receptor targets and then 2) finding linkers that can effectively tether the fragments, followed by 3) energy minimization of the assembled de novo compounds in the protein/receptor pocket. Typically one characterizes polar (hydrogen bond donor/acceptor and charged) components and hydrophobic regions of the binding pocket. These chemical features in the binding site landscape constitute features that may be exploited in the molecular design process to optimize the affinity of a virtual ligand to a target.

The docking of even a few 10-20 fragments to 3-4 binding site interaction centers e.g. 2 hydrogen bond donor/acceptors and a hydrophobic region leads to a large number of scored interactions for the fragments. Add to this a set of 5-10 different tethering components to tie those fragments together rapidly leads to a combinatorial explosion. For such an approach to be useful one must do triage on the compounds generated from the de novo design procedure.

SPROUT has been a pioneering tool in the field of de novo design and places emphasis on the pragmatic incorporation of tools to do `triage’ as one designs in the binding pocket of the target of interest. While one can do initial triage of the compounds one assembles from docked fragments based on predicted binding interaction scores (affinity), this is often not the correct approach. The reason, of course, is that typically the small fragments or small compounds tethered from just 1-2 fragments may only have 10-100 uM affinity. With addition of more fragments to the initially designed de novo leads, the affinity may improve substantially. For this reason it is wise to `prune’ first based on synthetic accessibility of the denovo leads.

COMPOUNDTRIAGE
Figure 1 below shows 2 hydrogen bond acceptor sites and a hydrophobic region recognized by the native ligand in a b-secretase crystal structure 2OHP. Selection of several polar fragments and tethering elements in a 4-hour SPROUT run resulted in 93843 compounds (Figure 2) favorably interacting with those 3-sites recognized by Compound 3 in the manuscript reporting this crystal structure (J.Med.Chem. 50: 1124, 2007). SPROUT enables one to rapidly derive a total interaction score for each of these compounds but what we really want to do in the design process is to examine the ease with which one can synthesize the compounds designed. De novo design of compounds that are not synthetically accessible is a meaningless enterprise.

There are two approaches that are tractable for such predictions from SimBioSys and KeyModule. The first is a tool embodied in SPROUT and in a new product TOPOMAX that assesses synthetic `complexity’ (Boda and Johnson J. Med. Chem., 2006, 49:5869-5879). The synthetic complexity approach is based on the compilation of a large number of compounds synthesized over the years and a detailed analysis of there substitution pattern in rings and chains. The concept is really an informatic principle that given sufficient sampling of a large synthetic compound space that the observed frequency of occurrence of particular structural substitution and topological patterns should infer the synthetic accessibility of a de novo compound. Boda and Johnson showed in their 2006 manuscript the manner in which compilation and use of a large synthetic complexity database allowed them to rapidly score compounds and that, in fact, that the synthetic complexity score had a good correlation with computations of synthetic accessibility (Figure 11 J. Med. Chem., 2006, 49:5869-5879) using medchem encoded PATCHEM rules in CAESA.

How and why is that useful? Figure 3 shows the synthetic complexity scores computed within SPROUT for 93843 compounds! The entire computation required just 35-minutes of CPU time on a single processor. SPROUT allows one to do this in the course of a de novo run. Figure 4 shows a small 262 compound sample of the present beta-secretase lead-hop run where we plot the CAESA batch analysis of the compounds synthetic accessibility and the SPROUT synthetic complexity. The correlation is not perfect but it is clear that there is, generally, speaking a monotonic trend.

This spectrum of tools allows one to pare down the number of virtual compounds using a combination of synthetic complexity and interaction scores to hundreds and thousands of virtual ligands employing SPROUT (and or TOPOMAX) with confidence that the complexity triage is indeed linked to the `ease of synthesis’. One then can then obtain a synthetic accessibility score and complete retrosynthetic pathways to known starting materials for a subset of compounds in a matter of minutes using CAESA batch.

This example problem highlights the importance of using the correct tool to the task. De Novo design commonly encompasses combinatorial explosion if the user does not perform triage on candidates based on both synthetic complexity(accessibility) and predicted binding. SimBioSys has a range of tools to gauge synthetic accessibility: CAESA/CAESA-batch, and docking tools with good Score-log(Kd) predicticity: eHiTS and eHiTS Lightning. The goal of SimBioSys tools is always to achieve speed without compromising accuracy. Look for a new technical note on this topic under Science:White papers on our web site next week!

Posted by Dan Harris

SimBioSys use the Beilstein Reaction Database as part of the ARChem Retrosynthetic Analysis Platform

Friday, November 7th, 2008

Some of you are likely aware of ARChem, our retrosynthetic analysis software (http://www.simbiosys.ca/archem/index.html). ARChem is the result of 4 years of development and results from a collaborative project with a major pharmaceutical company. Since then we have delivered the system to a number of other companies and we have recently submitted a publication to JCIM (http://pubs.acs.org/journals/jcisd8/index.html). It should be in press shortly. We presented on ARChem recently at the ACS meeting and a copy of the presentation is here.
What we’ve been up to recently is delivering on the needs of some of our users in terms of integration to latest ChemDraw ActiveX component (http://www.cambridgesoft.com/software/details/?ds=2&dsv=92), expanding the list of starting material databases supported by ARChem and, our most exciting news, working with the entire Beilstein reaction database (http://en.wikipedia.org/wiki/Beilstein_database). I had reported previously on the fact that we had been working with the Beilstein database (http://www.simbiosys.ca/blog/2008/05/30/29/). Since then we have reached an agreement with Elsevier to utilize the entire reaction database in order to train our clustering algorithms. More about this in the future but an example image is shown using the Beilstein Database below. Notice on the left of the image that an example reaction from the Beilstein database is displayed.

ARChem screen shot

Over the next few weeks you will see us blogging about the future development of ARChem. We are about to have a roundtable meeting with thought leaders from large pharna regarding the future development of ARChem and will be focused on the outcomes of this meeting to guide our development during our next coding cycle.

SimBioSys’ Announces Distribution Network in Japan

Tuesday, November 4th, 2008

Over the past few months we have seen an increasing demand for our software products in various parts of the world. The reputation of our eHiTS docking software in particular has gathered a lot of attention in the past couple of years and, with our recent developments in terms of eHiTS on a Cell processor (eHiTS Lightning) we have decided to furnish sales and support directly through local distributors. We now distribute across the majority of mainland Europe through our distributor ChemCad and we now have distributors in Japan.

We are happy to announce that Cybernet Systems Co., Ltd. (http://www.cybernet.co.jp/lifescience/) will distribute our full portfolio of software solutions to life science research scientists throughout Japan. In addition Argo Graphics Inc. (http://www.argo-graph.co.jp/) will distribute eHiTS Lightning on IBM’s Cell platform. Cybernet is ideally placed within the Japanese marketplace to take our products to their existing Life Science customers. Argo Graphics Inc. is ideally placed to take our eHiTS Lightning as a combined hardware and software solution to their Life Science IT customers. We look forward to a long and successful relationship with both Cybernet and Argo Graphics.