Can we trust docking results ?
Friday, September 3rd, 2010The question is asked and answered by a group of researchers from the University of Warsaw in a recently published paper (http://onlinelibrary.wiley.com/doi/10.1002/jcc.21643/abstract). They performed a comparison of 7 docking and scoring programs to evaluate pose prediction and score accuracy on a large set of 1300 PDB complexes. They performed a fairly thorough study asking some important questions, such as how the starting ligand conformations influence the results and how the results differ for small or large ligands, mostly hydrophobic or mostly polar interaction. The good news they report is that, statistically, overall results do not seem to be influenced by the starting conformations, although there is a slight advantage in some programs for the X-ray conformation, which is understandable. The bad news is that ligand size does matter: while we are very successful with small, fairly rigid molecules, large floppy ones still prove to be hard to handle for all programs. The really ugly news is that none of the scoring functions provided adequate correlation with binding energy.
“On the basis of those results, we can order programs in the following way: GOLD ~ eHiTS > Surflex > Glide > LigandFit > FlexX > AutoDock. The best programs have the average RMSD top score around 2.7 A, and it increases to nearly 4.5 A for the weakest FlexX. As expected, better results were observed for best pose conformations (Fig. 4). For those poses, the mean RMSD value was even below 2 A for GOLD, eHiTS, and Surflex. … Moreover, the percentage of pairs for which top score conformation is below 2 A shows that even for the best programs the success rate is below 60%, and in some cases even below 40%.”
A new ARChem release: integrable, more efficient and better performing
Tuesday, July 6th, 2010
One of the aspects of maturation is the transition from the egocentric viewpoint to a phase where one engages and considers others. It is true for kids that begin to understand and cope with social situations. It is true for soccer players, or scientists for that matter, that understand that it is not all about personal skills and knowledge, but it is also about how you utilize those in the team play. And it is true for software applications that shift from the stage of proving their algorithms’ capabilities, to become integrable with other applications and merge into a workflow that creates real value for the user.
Since the previous release, work has continued on improving reaction rules generation in ARChem as well as the retrosynthetic search. Significant progress has been made in detecting and highlighting potential functional groups interference. The chemoselectivity issue is a challenge that requires a combination of data mining, profound chemical perception, and supplemental expert knowledge-bases. Another area that recorded a significant improvement is scoring. The retrosynthetic search commonly generates a vast solutions-space with hundreds, and possibly thousands of paths. Navigating systematically through all the options is typically too time consuming, and scoring becomes pivotal in prioritizing the solutions for the user to inspect. Scoring now better reflects a chemist’s assessment regarding the feasibility of a synthetic route. It accounts for synthetic depth, reliability of individual reaction steps, yield, wastage, chemical interference and other considerations in a successful balance.
Alongside the major improvements in the underlying technology, the focus of the last few months has been on usability and integrability:
-
Reaction examples are directly linked to the Reaxys records for full data and literature access.
-
Starting materials arrived at during the search are pointing to the corresponding records in online chemical vendors catalogues.
-
Costs of starting materials are displayed, and route cost is evaluated.
-
As a rule is being used in the analysis, the example reactions that were used to generate this rule are now ordered by relevance to the synthetic route.
-
The solutions space can be pruned using user-defined filters.
-
Changes to the GUI make solutions navigation more efficient, and the general look and feel of the system is more polished and refined.
Here is an example that demonstrates some of the features mentioned above, and also elegantly validates the concept of automated retrosynthetic chemistry. The suggested route was ranked number 1 by the system. It manifests a sequence of three reaction rules that simplify the target all the way to commercially available starting materials, shown with their associated prices per mole. In this particular case, all the suggested transforms were actually exactly found in the set of reactions that generated the respective rules during the automated process of retrosynthetic-rule extraction. All the examples, and the exact-matches can be accessed via the links provided along the retrosynthetic tree. At the bottom right we show a literature reference for a synthesis of the molecule validating the route. ARChem offers a set of 28 distinct solutions that constitute a gateway to a much larger solutions space that can be accessed through the “n of m transforms” links. The user can build different solutions by selecting any of the suggested alternative transforms.
ARChem has made a long way from its proof of concept days. It is now maturing into a tool that can offer real benefits to the medicinal or process chemist, not the least thanks to the continuous feedback that we get from users. In the next few months substantial changes are anticipated in all the aspects of the system. Maturity does not mean stagnation – ARChem is at the forefront of the field of computer aided synthesis design, and intensive R&D guarantees that major advances are still to come. Stay tuned.
posted by Orr
Induced Protonation State Changes Upon Binding
Wednesday, March 31st, 2010There was an interesting article published recently in the Biophysical Journal, (Volume 98, Issue 5, 872-880, 3 March 2010, doi:10.1016/j.bpj.2009.11.016), in which biophysicists recognise the importance of protonation state induced changes upon binding - and mention that one of its key practical applications is in structure-based drug design.
Dr. Alexey Onufriev from Virginia Tech and his team investigated three types (small molecule, protein and nucleic acid) of ligands and their ionization state changes upon protein-ligand binding. They concluded that in all tree cases substantial changes can be observed both in the ligand and also in the receptor ionization states upon binding.
This is a very important observation for virtual screening and docking, because this proves our belief that protonation states of the proteins and ligands can not and should not be prepared and / or fixed for virtual screening experiments. Therefore eHiTS’ method of assigning the protonation states on-the-fly is probably the best method to-date offered to solve this problem. For more info on eHiTS’ automated protonation state handling, please see our technical note with the same title on this page: http://www.simbiosys.com/ehits/ehits_technical_notes.html
posted by Aniko
CLiDE – making chemical information a lot more accessible
Wednesday, January 27th, 2010As scientists we all learn to cope with ever growing amounts of information, coming from various sources. Scientific information, as virtually all types of information, is predominantly delivered in electronic formats – journal articles, patents, e-books, wiki pages, blogs, etc. We need this information to be readily accessible, and searchable, we archive it on our personal PCs, and on our organization’s servers and knowledge bases. As chemists, we have wonderful visualization techniques that allow us to sift through incredible amount of data, and information, but exactly in this place, there is a strong disconnect between the availability of information and its accessibility. 2D images of molecules are so pivotal to the way we digest chemistry, and yet, as images they are not too prone to our data mining tools. It would be great if publishers of chemistry articles were to retain the original structures in their electronic documents and there is no doubt that this will happen some time in the future. But, for now, we need a tool which can translate chemistry images into a connection table format which could allow integration of data from the literature into existing chemistry software.
CLiDE is an optical chemical structure recognition engine. It extracts connection tables of molecules from 2D images in various formats: PDF, postscript, JPEG, BMP, PNG, and TIFF. CLiDE has been around for some time now, but in the last two years it finally got the development boost it deserved, in order to make it a cool and useful instrument for every chemist’s toolkit. It is now equipped with a sleek GUI that can be used to read .pdf as well as a variety of image file formats. Any time you come across a structure of interest, simply select it, extract it, and save it, or send it to your favourite chemical editor (currently ChemDraw, ISISDraw and SymyxDraw are supported). We all know the feeling of looking at a page full of structures that are relevant to our work, and would like to transfer them to another application such as an Excel spreadsheet or a docking program but redrawing them using a graphic editor is tedious and prone to mistakes. CLiDE takes away this hassle. It comes in three flavours that can either process a single image at a time (standard), a whole document at a time (professional), or a full library of documents in one go (batch).
Below you’ll find a demo clip of the new CLiDE product, please contact us to obtain a password to watch it.
ARChem 2009.1 is released
Thursday, December 10th, 20092009 has been a year of major progress for ARChem, and the system has hit a number of significant milestones that secured its leading position in the field. We wanted to share a few of our achievements, and to extend our gratitude to many users whose comments have made an impact on the system.
-
Chemistry – Several changes to chemical perception algorithms have been implemented. They improve the way target molecules are being addressed, and the way reaction rules are being extracted and clustered from reaction databases. Those improvements have made a small set of manually coded reaction rules obsolete, and have enhanced the system’s capability to deal with some of the challenging aspects of organic synthesis such as chemical interference, stereochemistry and regioselectivity.
-
Data – As a knowledge-based system, ARChem is highly dependent on the quality and quantity of reactions data encapsulated in commercial databases. We are therefore grateful and proud to have further tightened our relationships with two leaders of the chemical information publishing industry: Elsevier and Symyx. Both CrossFire Beilstein, and Cheminform databases have been fully integrated into the system. Covering a vast spectrum of chemical reactions and offering valuable supporting information through the system.
-
Breaking up starting materials – The search down a branch of the retrosynthetic tree stops whenever a starting material from the educts database is found. Sometimes it is desirable to break such compounds to even simpler precursors, since they are expensive to purchase, not in stock, etc. The user can now exclude starting materials matching the target molecules, and find synthetic routes to those compounds.
-
Viewing solutions – The ability to browse through the manifold of generated solutions has been dramatically improved by a synoptic view of reaction steps. The user can see a “preview” of the various solutions by inspecting the list of the next proposed precursors, and jump directly to the associated solutions.
-
System design – ARChem is now a more complete system which can be used not only as a local installation, but also as an online service. A queueing system, security features, accelerated search times and many other features have upgraded the system performance, accessibility and usability.
Below is an example for a synthetic route found by ARChem for Maraviroc – an HIV drug that was developed in Pfizer’s labs in Sandwich, UK, and got FDA approval in 2007. ARChem’s solution includes 9 reactions, with 6 steps in the two longest paths. In this case, the retrosynthetic analysis leads all the way back to commercially available starting materials, shown with their corresponding providers and catalog numbers. ARChem supplies a lot more information to complete the experimental details of the synthetic scheme, such as, reaction conditions, bibliographic references, and additional starting materials providers and catalog numbers.
The above suggested synthetic route has been generated completely automatically with no user intervention. It is a strong demonstration of the huge potential of this concept, and of the accomplishments so far. We look forward to 2010 with plenty of items in the ARChem pipeline, and we are particularly eager to continue the dialogue with our industrial and academic users – a scientific exchange that guarantees that the development process maintains continuous, rigorous and coherent progress.
posted by Orr Ravitz
eHiTS 2009 as a Blind Docking Tool
Tuesday, December 1st, 2009As the molecular docking paradigm solidifies its status as a significant tool for drug discovery, chemists explore additional applications of the methods in ways that sometime stretch the existing algorithms to their limits. Most docking programs, including eHiTS, have not been designed or optimized to perform blind docking. In structure based drug discovery, the user is typically expected to define, at some level of accuracy, the binding pocket in the target of interest. The binding site is determined either based on known binding modes of ligands as found in crystal structures of complexes, or based on an educated hypothesis. There are cases, however, in which assumptions about the possible locations of binding hot spots are difficult or should be avoided altogether. This is the case, for example, when the existence of secondary binding sites is suspected, or when one would like to screen active ligands and other compounds on a range of targets to estimate the possibility for drug side-effects, toxicity, and other types of biological activities.
The standard eHiTS usage requires a rough definition of the binding pocket. This is done through the clip file. This file should contain at least two sets of coordinates (or two spatial points) that are located in the designated binding pocket. eHiTS then draws a box around those points, expands it to some extent in all directions and places the search grid inside that box. Then, the box is “flooded” with a virtual fluid to detect all the cavities which will define the binding surface. This is a highly automated process, but it still relies on that user-defined clipping. Commonly the native ligand, amino acids from the binding pocket, or a few atoms from either are chosen as a clip file. If eHiTS is run with the -complex option, the native ligand is inferred as the clipping coordinates. However, eHiTS could be used without any clipping. In this case, the entire receptor will be considered for docking. The whole protein will be flooded, and sufficiently deep clefts will be searched on its surface. The final space in which docking will be performed is defined by the interconnected pockets found on the target. The search grid in such scenarios is typically large, and extensive sampling is required. Nevertheless, the computational efficiency of the eHiTS algorithm allows good sampling in reasonable timescales.
Several eHiTS users expressed specific interest in blind docking in recent months, and therefore we decided to evaluate eHiTS’ performance in this context. We used the set that was used in an earlier blind docking evaluation (Hetenyi and van der Spoel, 2006 [1]). We focused on the 43 complexes used in the paper and have not attempted to use the apo structures. 3 codes (1B70, 1FIW and 1QIZ) were left out because of uncertainty regarding the exact structure used in the paper for docking. The default accuracy (3) was used throughout the study. The average blind docking time was 9 minutes per receptor for this set.
Results:
77.5% of the cases gave at least one conformation under 2 Å in the top 10 poses. In the other cases, one accumulative docking round using poses from the first round as clip files produced successful binding modes in the top 5 poses. The top rank pose is in most cases in the correct binding pocket, offering a good starting point for pose refinement.
The table here details the results for the specific codes. The Job# column describes whether the results were obtained with a single blind docking run, or with 2 cycles. The Rank# and RMSD columns indicate the rank of the first pose under 2 Å and its RMSD from the crystallographic conformation. The last two columns indicate the top-rank and closest poses RMSDs.
The blind docking of phenol into insulin (1MPJ) is shown in Picture1 below. The crystallographic pose is shown in cyan, and sample poses are shown in “hot spots” detected during docking. Those poses can be used to clip the receptor in accumulative docking runs in which the sampling is finer, and the binding pockets are better modelled. It should be noted that this code generates an unusually big number (5) of hot spots. In most cases in the set we observed three, two and often one hot spot, manifesting the detection of the correct binding pocket.

Picture1: Phenol binding to Insulin.Several potential binding pockets are detected for this small ligand.
1NGP (N1G9 FAB fragment) is a case where the majority of poses are generated far from the native ligand. Picture 2 below shows that most of the poses are located in the big cavity between chains L and H of the crystal structure. Several poses, however, reproduce the x-ray binding mode (in cyan) with close to 1 Å RMSD.

Picture2: 2-(4-hydroxy-3-nitrophenyl)acetic acid docked into N1G9 FAB fragment. The majority of poses are located in the big cavity between chains L and H.
Conclusions:
The above results clearly demonstrate the viability of eHiTS as a blind docking tool. In all cases the correct binding pocket has been identified in the top 32 solutions, and in most cases good poses under 2 Å and even 1 Å were found at the top of the generated poses. The conformations may be further refined by clipping the receptor for subsequent runs, and by working at higher accuracies. As always in eHiTS, the jobs are extremely easy to setup with a simple command line, and with no required preparation for the receptor or the ligand. This, and the speed of the calculations make eHiTS a high throughput blind docking solution.
Reference:
- Hetenyi, C. Van der Spoel, D.: ”Blind docking of drug-sized compounds to proteins with up to a thousand residues.”;
2006 Feb 20;580(5):1447-50. Epub 2006 Jan 31. - Blind docking results for eHiTS 2009.1, using the test set of Hetenyi et.al.[1]
by Orr Ravitz
eHiTS 2009.1 is released
Friday, November 27th, 2009We are pleased to announce the release of 2009.1 docking and screening portfolio, which includes: eHiTS, Tune, Score and LASSO packages for the Intel and Cell platforms. This is an important bug-fix release that resolves instability issues, improves the accuracy of docking, and introduces several new features.
One illustration of the progress is the Top Ranking and Closest Pose accuracy analysis on the Astex 85 [1, 2] test set (figures below) where 5-10% improvement can be observed in almost every category:


Overall the improvement between 2009.0 and 2009.1 is 0.36 A (i.e. 17%) in top ranking averages, and 0.17 (i.e. 18%) is closest average RMSD values.

For more detailed report on what is new in the package please see the release notes under the general docs pages: http://www.simbiosys.ca/docs/
References:
Ref 2: Astex diverse dataset http://www.ccdc.cam.ac.uk/products/life_sciences/gold/validation/astex_diverse/
[http:/ / www.ccdc.cam.ac.uk/ products/ life_sciences/ gold/ validation/ downloads/ download.php4]
Useful scripts available as free download on the SimBioSys website
Tuesday, November 24th, 2009We recently posted on the CCL (Computational Chemistry List, see link here), that there are some useful scripts for the molecular modelling Linux / Unix community available for free on our website. Some of these are specific for eHiTS users, some are more general-purpose. They are all available free of charge here: http://www.simbiosys.ca/download/scripts/index.html
Bookmark the above site, as we’ll keep updating it with more and more scripts.
A novel BACE-1 inhibitor discovered using eHiTS
Wednesday, November 4th, 2009It always feels good, when your product is successful in the hands of the end users, even more so when it comes to scientific software, and drug discovery.
A new article in Elsevier’s Bioorganic & Medicinal Chemistry Letters describes how researchers at the University of Leeds discovered novel non-peptide leads for β-secretase (BACE-1) - one of the key enzymes involved in the pathogenesis of Alzheimer’s disease, and a major target for drug discovery.
It is particularly exciting for us to know that our tools may play an instrumental role in finding a cure for a disease that affects so many beloved people in our lives.
The paper:
Interesting read: London Stock Exchange dumps Windows for Linux
Thursday, October 15th, 2009ComputerWorld reported on Oct 7th, 2009:
When it comes to business computer systems, nothing is more mission-critical than the massive trading software systems that underlie stock markets. A failure of an hour here can mean billions of dollars of lost trades….
see article at London Stock Exchange dumps Windows for Linux
Bottom line, the London Stock Exchange (LSE) had so many troubles and scandals due to software problems (crash, slow etc.) in the past, all related to their Windows 2003-based servers, that they decided to look for a Linux replacement that seems to be more reliable, faster and a lot cheaper solution - their conterpart in the USA - the NYSE - had done so a long time ago.
I hope the pharmaceutical companies do not consider their operations less mission critical.
posted by: Aniko


