Score tuning, available in eHiTS, is gaining grounds in docking
You may have come across a recent paper by a group of researchers from UCSD, Leeds and Stony Brook that utilized eHiTS for identifying targets for drug repurposing. The paper “A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing” (http://pubs.acs.org/doi/abs/10.1021/ci100369f) introduces an “inverse screening” scenario in which one searches for receptors that may bind a compound, in this case a known drug, and will suggest new therapeutic benefits for the molecule. The authors demonstrate experimentally and using benchmarks, how customizing the score to the receptors of interest improves the ability to identify active compounds, and to give a rough estimate for their relative activity.
The authors chose to use eHiTS not only because of its good performance as a docking tool, but also because it facilitates the score tuning exercise. eHiTS reports to the user not only a single score value, but also the individual terms, 20 in total, that build-up this value. The study examined various ways to recombine some of the score terms such that receptor-specific properties can be reproduced with improved accuracy.
An erratum for the paper was published yesterday (http://pubs.acs.org/doi/abs/10.1021/ci2001346), revising all the “native” eHiTS results, i.e. the out-of-the-box results without additional tuning. The original paper displayed results that were not in line with results obtained by us and others (see, for example: http://pubs.acs.org/doi/abs/10.1021/ci100374f), which prompted us to contact the authors. A few good words about the authors are in place. Once they heard our concerns, they acted swiftly and with full transparency to elucidate the problems, and once the source of the error was identified, they moved rapidly to publish the correction. This level of openness, integrity and cooperation should not be taken for granted, and we salute the team of researchers for their approach.
The root of the error was in the misidentification of the relevant scoring value in the eHiTS output. As stated in the erratum, eHiTS’ output includes an Energy value and a Score value. The Score value is the term that should be used in pose prediction and virtual screening scenarios. It is a scoring scheme that is trained on PDB complexes and is designed to reproduce crystallographic poses with high fidelity. In the scoring function training, many ligand poses are generated for each PDB complex, and the scoring functions is optimized to generate good score-RMSD correlation. Implicitly this process involves positive and negative data – the correct poses which are the objective and are to be promoted vs unrealistic poses which are rejected and suppressed. The eHiTS-Energy, on the other hand, is a scoring scheme that is designed to rank-order known active molecules. It is trained to produce score-binding affinity correlation, and therefore it is trained on positive data only. Hence, the eHiTS-Energy prospects of differentiating between actives and inactives are slim, which is demonstrated in the ROC charts of the original paper. eHiTS-Score, as shown in the erratum, strongly outperforms the energy in almost all cases, and generally shows good screening capabilities in most cases.
The main two conclusions of the paper are that (i) score tuning is a powerful approach to improve docking results, specifically in screening, and that (ii) non-linear methods for combining the scoring terms are superior to linear methods in this respect. We strongly support both observations. In fact, we have learned those lessons during the development of eHiTS’ scoring function, and therefore eHiTS adopted these principles a couple of years ago. Family-based scoring is available for many cases, and non-linear methods are central in its implementation. For dozens of protein-families, for which several complexes are available in the PDB, eHiTS provides a customized scoring which is invoked automatically by analyzing the geometry of any receptor provided by the user. When the user’s target is not matched to any family in eHiTS’ knowledge-base, a default scoring scheme is used. More about the tuning approaches in eHiTS can be found in this presentation:
http://www.simbiosys.com/science/presentations/2010-03-acs/eHiTS_239_ACS_website.pdf
We are pleased to see that score customization is growingly recognized as a promising path for improving the molecular docking paradigm. The new upcoming version of eHiTS will include even more sophisticated methods of utilizing experimental data. More about this in future posts.
Posted by Orr
