Cognate docking accuracy measurement vs pre-optimized pose

I have seen some papers published in peer reviewed journals where the authors have proposed (and executed) the following evaluation protocol for cognate docking:

  1. Optimize the crystal structure of the protein-ligand complex obtained by X-ray in order to remove any severe clashes, fix bad geometries etc. Save the protein receptor and the ligand into separate files. Note, that the optimization was performed together, not separately for receptor and ligand!
  2. Perform the docking into the protein structure obtained via the optimization of step 1. The input ligand structure starting conformation maybe randomized or optimized in vacuum or in solvent.
  3. Compute the Root Mean Square Deviation (RMSD) between the heavy atoms of the solution pose and the pose saved in step 1 after optimization together with the receptor.

Upon a quick surface-scan (i.e. without really analyzing or thinking about the meaning) this may even sound like a reasonable protocol. But if one looks a little deeper, it becomes clear that the method is very seriously flawed, especially if the docking procedure involves an optimization step using the same force-field or scoring function that is used in step 1.

The calculated RMSD value is simply the distance of two selected local minima of the scoring function and as such, has very little to do with the docking accuracy. To better explain this statement, let’s take a simple hypothetical 2D function (the real docking pose search space is 6+n dimensional, where n is the number of rotatable bonds, so it would be hard to visualize that) and follow what happens if we optimize some points (indicated by P and X on the figure2D function) by following the steepest path to a local maximum. Suppose, X represents the original X-ray ligand pose, and points indicated by P represent various docking poses generated prior to local optimization. The black arrows show where the points would end up after optimization. The distance indicated in white is the measured RMSD. You can see, that each local optima has an attraction region and if you move around the starting P or X points within the same region, then they would still end up in the same point after optimization, thus the “measure” isn’t very sensitive to the P or X locations. You can also see, that in the particular case I have drawn, there is actually another P position to the right of the X which was in fact closer to the X prior to the optimization than the one judged closest after optimization. It is also clear that if the pose generation sampling is fine enough to create one pose P within the attraction region of each local optima (on each mountain of the figure), then the measured RMSD would be zero - because the X would converge to the same peak as the P that fell onto the same mountain even if it was at the other side of it quite far apart. So, in other words, a sufficiently fine (exhaustive) sampling would have to guarantee a zero RMSD solution!

How fine such sampling would have to be ? That depends only on how “rough” the scoring function is. If we choose a very nice function which has a single minimum position only, then it is enough to generate a single pose anywhere and it will converge to the same point as the X-ray. If the previous 2D example wasn’t clear enough to convey this message, then look at the following figure that has a nice 1D function with a single minimum point.

1d function

See, starting from any point the optimization will end up always in the same place. Cool — does that mean we could create a smart enough scoring function that would give us a cheap way to reach zero RMSD solutions following the above protocol ? Of course, it does, indeed!

Let me present you the perfect docking suite:

  1. First we define our scoring function, or force field, let’s call it Origin Optimized Potential System (OOPS), where the energy of any structure is computed with the following formula:
    formula
    We will use the OOPS scoring function to optimize the receptor-ligand complex prior to docking.
  2. For the second step we need a docking algorithm, let’s call it Zero Rmsd Ideal Docking Engine (ZRIDE). ZRIDE simply assigns zero (0.0) to all atoms coordinates of the ligand (x,y,z) := (0,0,0)
  3. Now, we are ready to calculate the RMSD between the OOPS optimized X-ray pose and the generated docking pose. Since the OOPS energy function has a single minimum point at the origin, any X-ray pose will converge to move all atoms into the origin thus the RMSD from the docked pose is always ZERO (0.0).

If you still do not believe me, just download the following linux executable files (also available in a tar gzipped package) and try it out yourself for any receptor-ligand complex. The protocol, you need to follow for the test:

Step1. Minimize the xray structure in the context of the receptor:

./nanomodel -rec receptor.mol -lig xray.mol -min -out xray_min.mol

Step 2. Generate an input ligand from xray.mol using ANY energy minimizer tool (any force-field or conformation generator). Suppose, you have produced input.mol which has the altered conformation. (you may also just copy xray.mol to input.mol if you wish to use the original coordinates)

Step 3. Perform the docking calculation:

./zride -rec receptor.mol -lig input.mol -out result.mol

This will immediately report the RMSD value (always zero, i.e. 0.000A), but you can also use any other external tool to compute the RMSD between result.mol and xray_min.mol. It is VERY important to compare to the minimized xray_min.mol and not to the raw xray.mol since that may have a very bad geometry according to the OOPS force field. You can verify the score difference of them by running:

./oops -rec receptor.mol -lig xray.mol -score
./oops -rec receptor.mol -lig xray_min.mol -score

You will see that the raw xray file has a large positive (repulsive conformation strain energy) score, while the minimized version will have no strain, i.e. the score will be zero. :)
This product suite has reached the ultimate docking accuracy following the protocol suggested above — matching the one some peer reviewed publications have followed. So, I rest my case, if that protocol is
acceptable for anyone, then look no further, download the perfect solution free of charge today!

Disclaimer: Any resemblence between the above program names and real software tools (living on the market or dead) is an incredible coincidence by random chance.
ZZ

One Response to “Cognate docking accuracy measurement vs pre-optimized pose”

  1. SimBioSys Blog » Blog Archive » Chemical software quality 3 - polar surface area Says:

    […] Thank you very much, Peter. I am glad you changed your view about docking. This area has been the main focus of my research and development for the past 6 years and I believe we have very good results measured with sound scientific quality metrics. Mind you, some vendors in the past have used rather questionable metrics to report good results with and I have explained in that post how such metric can lead to ridiculous results. ZZ […]

Leave a Reply