SYNTHETIC COMPLEXITY AND ACCESSIBILITY: CHOOSING THE CORRECT TOOL TO THE TASK IN DE NOVO TRIAGE.
There are many instances in which one wants to assess the synthetic accessibility of a set of compounds. It is crucial to choose the right tool for the task. If one has just 10-15 compounds to assess then using ARChem route designer would be the correct choice. ARChem gives one an assessment of retrosynthetic pathways from consideration of 10-million+ rules derived from `clustering’ extended reaction cores derived from the literature. Often, however, one wants to survey the synthetic accessibility of a large database of `virtual compounds’ ranging from with hundreds to hundreds of thousands of compounds.
This often arises in the contexts of de novo design in fragment based design paradigms. De novo design based on fragment docking and tethering entails: 1) ‘docking’ low molecular-weight/simple-topology fragments into the binding pockets of protein/receptor targets and then 2) finding linkers that can effectively tether the fragments, followed by 3) energy minimization of the assembled de novo compounds in the protein/receptor pocket. Typically one characterizes polar (hydrogen bond donor/acceptor and charged) components and hydrophobic regions of the binding pocket. These chemical features in the binding site landscape constitute features that may be exploited in the molecular design process to optimize the affinity of a virtual ligand to a target.
The docking of even a few 10-20 fragments to 3-4 binding site interaction centers e.g. 2 hydrogen bond donor/acceptors and a hydrophobic region leads to a large number of scored interactions for the fragments. Add to this a set of 5-10 different tethering components to tie those fragments together rapidly leads to a combinatorial explosion. For such an approach to be useful one must do triage on the compounds generated from the de novo design procedure.
SPROUT has been a pioneering tool in the field of de novo design and places emphasis on the pragmatic incorporation of tools to do `triage’ as one designs in the binding pocket of the target of interest. While one can do initial triage of the compounds one assembles from docked fragments based on predicted binding interaction scores (affinity), this is often not the correct approach. The reason, of course, is that typically the small fragments or small compounds tethered from just 1-2 fragments may only have 10-100 uM affinity. With addition of more fragments to the initially designed de novo leads, the affinity may improve substantially. For this reason it is wise to `prune’ first based on synthetic accessibility of the denovo leads.

Figure 1 below shows 2 hydrogen bond acceptor sites and a hydrophobic region recognized by the native ligand in a b-secretase crystal structure 2OHP. Selection of several polar fragments and tethering elements in a 4-hour SPROUT run resulted in 93843 compounds (Figure 2) favorably interacting with those 3-sites recognized by Compound 3 in the manuscript reporting this crystal structure (J.Med.Chem. 50: 1124, 2007). SPROUT enables one to rapidly derive a total interaction score for each of these compounds but what we really want to do in the design process is to examine the ease with which one can synthesize the compounds designed. De novo design of compounds that are not synthetically accessible is a meaningless enterprise.
There are two approaches that are tractable for such predictions from SimBioSys and KeyModule. The first is a tool embodied in SPROUT and in a new product TOPOMAX that assesses synthetic `complexity’ (Boda and Johnson J. Med. Chem., 2006, 49:5869-5879). The synthetic complexity approach is based on the compilation of a large number of compounds synthesized over the years and a detailed analysis of there substitution pattern in rings and chains. The concept is really an informatic principle that given sufficient sampling of a large synthetic compound space that the observed frequency of occurrence of particular structural substitution and topological patterns should infer the synthetic accessibility of a de novo compound. Boda and Johnson showed in their 2006 manuscript the manner in which compilation and use of a large synthetic complexity database allowed them to rapidly score compounds and that, in fact, that the synthetic complexity score had a good correlation with computations of synthetic accessibility (Figure 11 J. Med. Chem., 2006, 49:5869-5879) using medchem encoded PATCHEM rules in CAESA.
How and why is that useful? Figure 3 shows the synthetic complexity scores computed within SPROUT for 93843 compounds! The entire computation required just 35-minutes of CPU time on a single processor. SPROUT allows one to do this in the course of a de novo run. Figure 4 shows a small 262 compound sample of the present beta-secretase lead-hop run where we plot the CAESA batch analysis of the compounds synthetic accessibility and the SPROUT synthetic complexity. The correlation is not perfect but it is clear that there is, generally, speaking a monotonic trend.
This spectrum of tools allows one to pare down the number of virtual compounds using a combination of synthetic complexity and interaction scores to hundreds and thousands of virtual ligands employing SPROUT (and or TOPOMAX) with confidence that the complexity triage is indeed linked to the `ease of synthesis’. One then can then obtain a synthetic accessibility score and complete retrosynthetic pathways to known starting materials for a subset of compounds in a matter of minutes using CAESA batch.
This example problem highlights the importance of using the correct tool to the task. De Novo design commonly encompasses combinatorial explosion if the user does not perform triage on candidates based on both synthetic complexity(accessibility) and predicted binding. SimBioSys has a range of tools to gauge synthetic accessibility: CAESA/CAESA-batch, and docking tools with good Score-log(Kd) predicticity: eHiTS and eHiTS Lightning. The goal of SimBioSys tools is always to achieve speed without compromising accuracy. Look for a new technical note on this topic under Science:White papers on our web site next week!
Posted by Dan Harris
