Can we trust docking results ?
Friday, September 3rd, 2010The question is asked and answered by a group of researchers from the University of Warsaw in a recently published paper (http://onlinelibrary.wiley.com/doi/10.1002/jcc.21643/abstract). They performed a comparison of 7 docking and scoring programs to evaluate pose prediction and score accuracy on a large set of 1300 PDB complexes. They performed a fairly thorough study asking some important questions, such as how the starting ligand conformations influence the results and how the results differ for small or large ligands, mostly hydrophobic or mostly polar interaction. The good news they report is that, statistically, overall results do not seem to be influenced by the starting conformations, although there is a slight advantage in some programs for the X-ray conformation, which is understandable. The bad news is that ligand size does matter: while we are very successful with small, fairly rigid molecules, large floppy ones still prove to be hard to handle for all programs. The really ugly news is that none of the scoring functions provided adequate correlation with binding energy.
“On the basis of those results, we can order programs in the following way: GOLD ~ eHiTS > Surflex > Glide > LigandFit > FlexX > AutoDock. The best programs have the average RMSD top score around 2.7 A, and it increases to nearly 4.5 A for the weakest FlexX. As expected, better results were observed for best pose conformations (Fig. 4). For those poses, the mean RMSD value was even below 2 A for GOLD, eHiTS, and Surflex. … Moreover, the percentage of pairs for which top score conformation is below 2 A shows that even for the best programs the success rate is below 60%, and in some cases even below 40%.”
SimBioSys presentations at the Fall 2010 ACS meeting
Thursday, August 26th, 2010SimBioSys co-founders, Prof. Peter Johnson (UK) and Dr. Zsolt Zsoldos (Canada), along with our collaborator Dr. Sean Ekins (USA), delivered five talks at this past ACS meeting. They said that the meeting was a great opportunity to catch up with people they know and to meet new people. According to them, many of the other talks they attended were inspiring, and now, as they are making their journeys back home, their presentations are being posted here to share the science with you:
http://www.simbiosys.com/science/presentations/index.html
Peter Johnson et.al.: “Automated retrosynthetic analysis: An old flame rekindled”
view slides
http://www.simbiosys.com/science/presentations/2010-08-acs/ARChem_ACS_Boston_2010_final.pdf
Sean Ekins et.al.: “LASSO-ing potential pregnane X receptor agonists”
view slide
http://www.simbiosys.com/science/presentations/2010-08-acs/ACS2010_LASSO.pdf
Zsolt Zsoldos et.al: “How eHiTS solves the docking and scoring problems”
view slides
http://www.simbiosys.com/science/presentations/2010-08-acs/ACS2010_eHiTS_lessons.pdf
Zsolt Zsoldos et.al: “Scoring performance of eHiTS on the CSAR dataset”
view slides
http://www.simbiosys.com/science/presentations/2010-08-acs/ACS2010_CSAR_ehits_score.pdf
Zsolt Zsoldos et.al: “Protein-ligand docking on the Cell/BE processor with eHiTS Lightning”
view slides
http://www.simbiosys.com/science/presentations/2010-08-acs/ACS2010_HPC_ehits.pdf
posted by: Aniko
Meet up with SimBioSys at the Fall ACS Meeting in Boston in 10 days
Wednesday, August 11th, 2010Only 10 days left to the upcoming ACS meeting in Boston (Sun Aug 21 - Thurs Aug 26), and most of the people attending are preparing their personal schedules: the must-go-lectures, the booths at the expo floor showing the latest and greatest technology, and the social networking and get-togethers.
SimBioSys will be there with no exception. We will be showcasing our latest product releases at booth: # 945. The focus will be on the new:
* ARChem 2010 release:
http://www.simbiosys.com/blog/2010/07/06/a-new-archem-release-integrable-more-efficient-and-better-performing/
* The upcoming CLiDE v 4.0 release that is currently in BETA testing - and shows significant improvement in recognition of chemical structures from PDF files and images.
* eHiTS, the exciting participant of the first CSAR benchmark exercise!
The science, algorithms and software design that are embodied in these products will be discussed in five different talks given by:
Zsolt Zsoldos:
COMP 25:
How eHiTS solves the docking and scoring problems
Session:Drug Discovery (08:30 AM - 11:45 AM)
Time: Sunday, August 22, 2010 09:30 AM
Location: Boston Convention & Exhibition Center
Room:Room 154
COMP 59
Protein-ligand docking on the Cell/BE processor with eHiTS Lightning
Session: Scripting & Programming
Time: Sunday, August 22, 2010 03:50 PM
Location: Boston Convention & Exhibition Center
Room: Room 157A
COMP 122
Scoring performance of eHiTS on the CSAR dataset
Session: The Community Structure-Activity Resource (CSAR) Scoring Challenge (09:00 AM - 11:50 AM)
Time: Monday, August 23, 2010 10:05 AM
Location: Boston Convention & Exhibition Center
Room: Room 157B
Peter Johnson:
CINF 42
Automated retrosynthetic analysis: An old flame rekindled
Session: The Journal of Chemical Information and Modeling’s 50th Anniversary Symposium
Time: Monday, August 23, 2010 - 11:40 AM
Location: Boston Convention & Exhibition Center
Room: Room 156A
Sean Ekins who has been collaborating with us in the past few months:
TOXI 4
LASSO-ing potential pregnane X receptor agonists
Session: General Papers
Time: Sunday, August 22, 2010 09:00 AM
Location:Boston Convention & Exhibition Center
Room: Room 252B
We hope you will find these talks interesting and that you will catch up with the SimBioSys researchers either following these presentations or at the booth. If you would like to schedule a meeting with us in advance please contact: aniko *at* simbiosys dot com
Have a great ACS meeting, and an enjoyable trip to Boston.
posted by Aniko
A new ARChem release: integrable, more efficient and better performing
Tuesday, July 6th, 2010
One of the aspects of maturation is the transition from the egocentric viewpoint to a phase where one engages and considers others. It is true for kids that begin to understand and cope with social situations. It is true for soccer players, or scientists for that matter, that understand that it is not all about personal skills and knowledge, but it is also about how you utilize those in the team play. And it is true for software applications that shift from the stage of proving their algorithms’ capabilities, to become integrable with other applications and merge into a workflow that creates real value for the user.
Since the previous release, work has continued on improving reaction rules generation in ARChem as well as the retrosynthetic search. Significant progress has been made in detecting and highlighting potential functional groups interference. The chemoselectivity issue is a challenge that requires a combination of data mining, profound chemical perception, and supplemental expert knowledge-bases. Another area that recorded a significant improvement is scoring. The retrosynthetic search commonly generates a vast solutions-space with hundreds, and possibly thousands of paths. Navigating systematically through all the options is typically too time consuming, and scoring becomes pivotal in prioritizing the solutions for the user to inspect. Scoring now better reflects a chemist’s assessment regarding the feasibility of a synthetic route. It accounts for synthetic depth, reliability of individual reaction steps, yield, wastage, chemical interference and other considerations in a successful balance.
Alongside the major improvements in the underlying technology, the focus of the last few months has been on usability and integrability:
-
Reaction examples are directly linked to the Reaxys records for full data and literature access.
-
Starting materials arrived at during the search are pointing to the corresponding records in online chemical vendors catalogues.
-
Costs of starting materials are displayed, and route cost is evaluated.
-
As a rule is being used in the analysis, the example reactions that were used to generate this rule are now ordered by relevance to the synthetic route.
-
The solutions space can be pruned using user-defined filters.
-
Changes to the GUI make solutions navigation more efficient, and the general look and feel of the system is more polished and refined.
Here is an example that demonstrates some of the features mentioned above, and also elegantly validates the concept of automated retrosynthetic chemistry. The suggested route was ranked number 1 by the system. It manifests a sequence of three reaction rules that simplify the target all the way to commercially available starting materials, shown with their associated prices per mole. In this particular case, all the suggested transforms were actually exactly found in the set of reactions that generated the respective rules during the automated process of retrosynthetic-rule extraction. All the examples, and the exact-matches can be accessed via the links provided along the retrosynthetic tree. At the bottom right we show a literature reference for a synthesis of the molecule validating the route. ARChem offers a set of 28 distinct solutions that constitute a gateway to a much larger solutions space that can be accessed through the “n of m transforms” links. The user can build different solutions by selecting any of the suggested alternative transforms.
ARChem has made a long way from its proof of concept days. It is now maturing into a tool that can offer real benefits to the medicinal or process chemist, not the least thanks to the continuous feedback that we get from users. In the next few months substantial changes are anticipated in all the aspects of the system. Maturity does not mean stagnation – ARChem is at the forefront of the field of computer aided synthesis design, and intensive R&D guarantees that major advances are still to come. Stay tuned.
posted by Orr
An interesting discussion on: The Ideal Synthesis
Wednesday, June 30th, 2010We are dedicated readers of Derek Lowe’s wonderful blog about the pharmaceutical industry and drug discovery. With his witty style, Lowe is covering many aspects of this field, and is shedding light on many facets that are not always very obvious for people like us who are not directly involved in drug discovery. It is no surprise that the blog has attracted a sizable group of commentators that add their own experienced perspectives to the posts.
One of his latest entries discussed the concept of the “Ideal Synthesis”. While largely an elusive notion, thinking about what constitutes a good synthesis, is an important discussion that we constantly hold between us and ARChem’s users. After all, typically ARChem generates a whole range of synthetic routes to target molecules, and while the user can browse through them all and choose the more useful routes in the specific scenario, the system does offer its own prioritization of solutions as a means of assistance to the user. The rank ordering of synthetic routes is trying to mimic a chemist’s perspective, but this in itself, is not a well defined entity. Although we know what are the essential components, like: yield, minimal wastage, few synthetic steps, and robust reactions, striking the (or a) right balance between the terms is tricky. Lowe’s blog post, the paper it refers to, and the ensuing discussion there, are very helpful.
links:
http://pipeline.corante.com/archives/2010/06/29/the_ideal_synthesis.php
http://pubs.acs.org/doi/abs/10.1021/jo1006812
posted by Aniko
Induced Protonation State Changes Upon Binding
Wednesday, March 31st, 2010There was an interesting article published recently in the Biophysical Journal, (Volume 98, Issue 5, 872-880, 3 March 2010, doi:10.1016/j.bpj.2009.11.016), in which biophysicists recognise the importance of protonation state induced changes upon binding - and mention that one of its key practical applications is in structure-based drug design.
Dr. Alexey Onufriev from Virginia Tech and his team investigated three types (small molecule, protein and nucleic acid) of ligands and their ionization state changes upon protein-ligand binding. They concluded that in all tree cases substantial changes can be observed both in the ligand and also in the receptor ionization states upon binding.
This is a very important observation for virtual screening and docking, because this proves our belief that protonation states of the proteins and ligands can not and should not be prepared and / or fixed for virtual screening experiments. Therefore eHiTS’ method of assigning the protonation states on-the-fly is probably the best method to-date offered to solve this problem. For more info on eHiTS’ automated protonation state handling, please see our technical note with the same title on this page: http://www.simbiosys.com/ehits/ehits_technical_notes.html
posted by Aniko
Presentations at the Fields Institute and at the Spring 2010 ACS meeting
Monday, March 8th, 2010The Fields Institute, located in Toronto, is a center for mathematical research activity - a place where mathematicians from Canada and abroad, from business, industry and financial institutions, come together to carry out research and formulate problems of mutual interest.
SimBioSys founder and CSO, Zsolt Zsoldos, who is both a mathematician / computer scientist and a chemist, was recently invited to speak at one of the Fields’ Seminars. This was a great honour and recognition of the scientific
work he does at SimBioSys with his team of exceptional and talented researchers. The title of the March 2nd, 2010 presentation was: “Algorithmic and mathematical challenges in protein-ligand docking and scoring”, which has been a significant part of Zsolt’s work in the past 10 years. He tried squeezing it into just a 1 hour session, and that alone was a huge challenge. Nevertheless, there were many sparkling eyes in the audience, and hopefully the whole topic created enough interest so that we’ll see a few more mathematicians in this challenging field of science in the future. You can check out Zsolt’s talk at: http://www.simbiosys.com/science/presentations/index.html#2010
the audio and slides of the talk will be also shortly posted at Fields Institute’s web site at: http://www.fields.utoronto.ca/audio/#optimization_seminar
Another current, and interesting talk by a SimBioSys’ scientist will be given by Orr Ravitz at the upcoming spring 2010 ACS meeting in San Francisco. He will be talking about “Improving molecular docking through eHiTS’ tunable
scoring function”, in the Drug Discovery session on Monday March 22, 2010 at 10:00 am.
Abstract: The molecular docking paradigm has been hampered by the lack of a generically well performing scoring function. We present two complementary family-based approaches for score-tuning that improve docking performance using experimental data. One technique treats the relative weights of the eHiTS energy terms as parameters that can be adjusted to improve score-RMSD correlations. The other technique is employing ligand-based similarity to rescale the docking score such that better enrichment factors are achieved in virtual screening. We discuss the algorithmic details of the methods, and demonstrate the effects of score tuning on a variety of targets, including CDK2, BACE1 and AChBP, as well as on common benchmarks. We observe an average improvement of 10% in the top-rank pose RMSD, and a similar improvement for docking success (top pose under 2 A). An average EF(1%) of 15 is achieved for the targets in the DUD set.
http://abstracts.acs.org/chem/239nm/program/view.php?obj_id=9832&terms=
Should be a discussion starter! Please join us for the session if you’ll be at the ACS meeting in SFO in two weeks, and contact us if you would like to meet with us during the days of the conference.
posted by Aniko
ARChem 2009.1 is released
Thursday, December 10th, 20092009 has been a year of major progress for ARChem, and the system has hit a number of significant milestones that secured its leading position in the field. We wanted to share a few of our achievements, and to extend our gratitude to many users whose comments have made an impact on the system.
-
Chemistry – Several changes to chemical perception algorithms have been implemented. They improve the way target molecules are being addressed, and the way reaction rules are being extracted and clustered from reaction databases. Those improvements have made a small set of manually coded reaction rules obsolete, and have enhanced the system’s capability to deal with some of the challenging aspects of organic synthesis such as chemical interference, stereochemistry and regioselectivity.
-
Data – As a knowledge-based system, ARChem is highly dependent on the quality and quantity of reactions data encapsulated in commercial databases. We are therefore grateful and proud to have further tightened our relationships with two leaders of the chemical information publishing industry: Elsevier and Symyx. Both CrossFire Beilstein, and Cheminform databases have been fully integrated into the system. Covering a vast spectrum of chemical reactions and offering valuable supporting information through the system.
-
Breaking up starting materials – The search down a branch of the retrosynthetic tree stops whenever a starting material from the educts database is found. Sometimes it is desirable to break such compounds to even simpler precursors, since they are expensive to purchase, not in stock, etc. The user can now exclude starting materials matching the target molecules, and find synthetic routes to those compounds.
-
Viewing solutions – The ability to browse through the manifold of generated solutions has been dramatically improved by a synoptic view of reaction steps. The user can see a “preview” of the various solutions by inspecting the list of the next proposed precursors, and jump directly to the associated solutions.
-
System design – ARChem is now a more complete system which can be used not only as a local installation, but also as an online service. A queueing system, security features, accelerated search times and many other features have upgraded the system performance, accessibility and usability.
Below is an example for a synthetic route found by ARChem for Maraviroc – an HIV drug that was developed in Pfizer’s labs in Sandwich, UK, and got FDA approval in 2007. ARChem’s solution includes 9 reactions, with 6 steps in the two longest paths. In this case, the retrosynthetic analysis leads all the way back to commercially available starting materials, shown with their corresponding providers and catalog numbers. ARChem supplies a lot more information to complete the experimental details of the synthetic scheme, such as, reaction conditions, bibliographic references, and additional starting materials providers and catalog numbers.
The above suggested synthetic route has been generated completely automatically with no user intervention. It is a strong demonstration of the huge potential of this concept, and of the accomplishments so far. We look forward to 2010 with plenty of items in the ARChem pipeline, and we are particularly eager to continue the dialogue with our industrial and academic users – a scientific exchange that guarantees that the development process maintains continuous, rigorous and coherent progress.
posted by Orr Ravitz
eHiTS 2009 as a Blind Docking Tool
Tuesday, December 1st, 2009As the molecular docking paradigm solidifies its status as a significant tool for drug discovery, chemists explore additional applications of the methods in ways that sometime stretch the existing algorithms to their limits. Most docking programs, including eHiTS, have not been designed or optimized to perform blind docking. In structure based drug discovery, the user is typically expected to define, at some level of accuracy, the binding pocket in the target of interest. The binding site is determined either based on known binding modes of ligands as found in crystal structures of complexes, or based on an educated hypothesis. There are cases, however, in which assumptions about the possible locations of binding hot spots are difficult or should be avoided altogether. This is the case, for example, when the existence of secondary binding sites is suspected, or when one would like to screen active ligands and other compounds on a range of targets to estimate the possibility for drug side-effects, toxicity, and other types of biological activities.
The standard eHiTS usage requires a rough definition of the binding pocket. This is done through the clip file. This file should contain at least two sets of coordinates (or two spatial points) that are located in the designated binding pocket. eHiTS then draws a box around those points, expands it to some extent in all directions and places the search grid inside that box. Then, the box is “flooded” with a virtual fluid to detect all the cavities which will define the binding surface. This is a highly automated process, but it still relies on that user-defined clipping. Commonly the native ligand, amino acids from the binding pocket, or a few atoms from either are chosen as a clip file. If eHiTS is run with the -complex option, the native ligand is inferred as the clipping coordinates. However, eHiTS could be used without any clipping. In this case, the entire receptor will be considered for docking. The whole protein will be flooded, and sufficiently deep clefts will be searched on its surface. The final space in which docking will be performed is defined by the interconnected pockets found on the target. The search grid in such scenarios is typically large, and extensive sampling is required. Nevertheless, the computational efficiency of the eHiTS algorithm allows good sampling in reasonable timescales.
Several eHiTS users expressed specific interest in blind docking in recent months, and therefore we decided to evaluate eHiTS’ performance in this context. We used the set that was used in an earlier blind docking evaluation (Hetenyi and van der Spoel, 2006 [1]). We focused on the 43 complexes used in the paper and have not attempted to use the apo structures. 3 codes (1B70, 1FIW and 1QIZ) were left out because of uncertainty regarding the exact structure used in the paper for docking. The default accuracy (3) was used throughout the study. The average blind docking time was 9 minutes per receptor for this set.
Results:
77.5% of the cases gave at least one conformation under 2 Å in the top 10 poses. In the other cases, one accumulative docking round using poses from the first round as clip files produced successful binding modes in the top 5 poses. The top rank pose is in most cases in the correct binding pocket, offering a good starting point for pose refinement.
The table here details the results for the specific codes. The Job# column describes whether the results were obtained with a single blind docking run, or with 2 cycles. The Rank# and RMSD columns indicate the rank of the first pose under 2 Å and its RMSD from the crystallographic conformation. The last two columns indicate the top-rank and closest poses RMSDs.
The blind docking of phenol into insulin (1MPJ) is shown in Picture1 below. The crystallographic pose is shown in cyan, and sample poses are shown in “hot spots” detected during docking. Those poses can be used to clip the receptor in accumulative docking runs in which the sampling is finer, and the binding pockets are better modelled. It should be noted that this code generates an unusually big number (5) of hot spots. In most cases in the set we observed three, two and often one hot spot, manifesting the detection of the correct binding pocket.

Picture1: Phenol binding to Insulin.Several potential binding pockets are detected for this small ligand.
1NGP (N1G9 FAB fragment) is a case where the majority of poses are generated far from the native ligand. Picture 2 below shows that most of the poses are located in the big cavity between chains L and H of the crystal structure. Several poses, however, reproduce the x-ray binding mode (in cyan) with close to 1 Å RMSD.

Picture2: 2-(4-hydroxy-3-nitrophenyl)acetic acid docked into N1G9 FAB fragment. The majority of poses are located in the big cavity between chains L and H.
Conclusions:
The above results clearly demonstrate the viability of eHiTS as a blind docking tool. In all cases the correct binding pocket has been identified in the top 32 solutions, and in most cases good poses under 2 Å and even 1 Å were found at the top of the generated poses. The conformations may be further refined by clipping the receptor for subsequent runs, and by working at higher accuracies. As always in eHiTS, the jobs are extremely easy to setup with a simple command line, and with no required preparation for the receptor or the ligand. This, and the speed of the calculations make eHiTS a high throughput blind docking solution.
Reference:
- Hetenyi, C. Van der Spoel, D.: ”Blind docking of drug-sized compounds to proteins with up to a thousand residues.”;
2006 Feb 20;580(5):1447-50. Epub 2006 Jan 31. - Blind docking results for eHiTS 2009.1, using the test set of Hetenyi et.al.[1]
by Orr Ravitz
eHiTS 2009.1 is released
Friday, November 27th, 2009We are pleased to announce the release of 2009.1 docking and screening portfolio, which includes: eHiTS, Tune, Score and LASSO packages for the Intel and Cell platforms. This is an important bug-fix release that resolves instability issues, improves the accuracy of docking, and introduces several new features.
One illustration of the progress is the Top Ranking and Closest Pose accuracy analysis on the Astex 85 [1, 2] test set (figures below) where 5-10% improvement can be observed in almost every category:


Overall the improvement between 2009.0 and 2009.1 is 0.36 A (i.e. 17%) in top ranking averages, and 0.17 (i.e. 18%) is closest average RMSD values.

For more detailed report on what is new in the package please see the release notes under the general docs pages: http://www.simbiosys.ca/docs/
References:
Ref 2: Astex diverse dataset http://www.ccdc.cam.ac.uk/products/life_sciences/gold/validation/astex_diverse/
[http:/ / www.ccdc.cam.ac.uk/ products/ life_sciences/ gold/ validation/ downloads/ download.php4]


