Most flexible ligand/rigid protein and flexible ligand/flexible protein docking approaches do predict poses reproducing known structural solutions amongst their top ranking scored poses. But when assessing diverse top scoring poses for ligands in a screening exercise I find it time consuming, even with employing some auxiliary pharmacophoric information, deciding which poses amongst my top scoring poses are the most pharmacologically relevant.
Let me simplify that statement! What I would really like is to have confidence that if I took my `top 5-10 poses’ that there would be a much higher likelihood of finding the biologically relevant pose in that group than the `next ten’. Moreover, we all would like our pose scores to bear some resemblance to IC50/Kd rankings of our screened ligands, be they agonists, inverse agonists, or inhibitors at the pharmacological endpoint!
I am a guy that early on naively believed that docking solutions relying on physics based approaches, for example carefully developed charge sets (1-6-12 potentials) could get me there. After all, I have done numerous MD simulations and employed MMPBSA approaches to estimate binding free energies to good effect(Proteins 55:895-914). But this worked well, if I had the right ligand pose(!) and if either the enthalpic components dominated or my crude estimate of entropic terms was adequate. The bottom line is that most docking approaches while doing OK on pose prediction do not, to date, give you good Score-Based Good/Bad RMS separation or give you much confidence in using docking scores to `rank’ prospective ligands for synthesis(J. Med. Chem, 49:5912-5931).
eHiTS is an informatics based approach (J.Mol. Graphics Mod. 26: 198-212) and what I have learned is that it is powerful in providing me with two major items on `my Christmas wish list’:
1.Good Score-RMS correlation (good scores have low RMSD), and
2.Good correlation with ln(IC50)/ln(Kd)!
What I have learned this month is that if you `train a customized scoring function’ for my pharmacological protein target by using ~5 co-crystallized complexes I can achieve both endpoints on my Christmas wish list. That is powerful.
How about an example of this?
Nicotinic Acetylcholine receptors (nAChR’s) are an important class of proteins amongst a superfamily of ligand `gated’ (allosterically modulated) ion-channels. One of the surrogate proteins having binding motifs and pharmacology analogous to nAChRs is the acetylcholine binding protein (AchBP). We have begun investigations on this class of proteins given it is a challenging problem of ligand recognition via conserved aromatic motifs in the binding pocket via the cation interactions with a `box’ of Trp/Tyr residues. The upper left hand panel of the figure below shows three classic cationic ligands, acetylcholine, nicotine, and carbamylcholine(CCE). All of these ligands bind with considerable affinity to the binding pocket containing Trp/Tyr aromatic residues with the cationic center interacting with Trp/Tyr residues. Considerable experimental and computational evidence suggests that the pi-cation interactions are a substantial contribution to the binding free energy of these ligands to AChBP. The right uppermost panel shows this interaction motif for CCE in the crystal structure 1UV6. The lower left panel shows eHiTS superimposed docked poses of nicotine(NIC) and carbamylcholine (CCE) interacting with the surrounding pi electron system. While these poses have low scores for the default `out-of the box scoring function’, the informatics based scoring function did not involve `training’ including this class of proteins, and the top plot (labeled `untrained’) shows that there is no correlation of `score-regime’ with low RMSD poses (relative to the crystallographic pose). “What happens if you train a scoring function specific to this class of proteins as regards score separation of low (good) RMSD and high RMSD poses?” The lower plot in the right lower panel tells the story. One obtains: Score Based Separation (Correlation) of your low RMSD from your high (RMSD) poses. Which scoring function would you want to use for virtual screening of new leads for this class of protein? I think you know the answer to that one. You would want to use the one where you had confidence that your top scoring poses were the ones with probable low RMSD to the actual pharmacological pose.
Come back for the 2nd part of the story eHiTS Score-lnKd based correlations in my next blog.
Posted by DLH