CompanyProductsScienceSupportWhatsnew
[Product Releases]
Index
[Blog]

Most recent post

[News]

Can we trust docking results?
Sept 2010

IBM Systems and Technology Group releases a white paper with eHiTS and Cell
Oct 2008

EPA's ToxCastTM project will use SimBioSys' eHiTS as docking engine
Nov, 2007

[Events]

240th ACS
Aug 22-26, 2010
Boston, MA, USA
booth #945
see >> more

Index

eHiTS ® : Electronic High Throughput Screening

Frequently Asked Questions:

  1. Does the software predict the Active Site?
  2. If the protein is a hypothetical one, is there any provision to define the active/binding site or can it perform a blind docking?
  3. Do I need to add hydrogens to the protein?
  4. Should I add hydrogens to the input ligand?
  5. What does eHiTS do about protein flexibility?
  6. Can eHiTS work with peptide ligands?
  7. Why eHiTS says: "cannot convert" for some of my peptide ligands in PDB format?
  8. Why does eHiTS give me an error when it tries to convert the PDB file?
  9. eHiTS is giving strange results when I use PDB files generated with Sybyl, why?
  10. What kind of CHARGES should I save in Sybyl as MOL2 input file for eHiTS?
  11. What input formats does eHiTS accept?
  12. Do I need to convert my 2D ligands into 3D?
  13. How does eHiTS identify the file type?
  14. How are cofactors, water and metal ions considered during docking?
  15. How does it handle Water Molecules? (Freely rotatable, displaceable or oriented?)
  16. Can eHiTS handle metal ions including metalloprotiens?
  17. Do I need to prepare the correct protonation state of the ligand?
  18. Can I assign protonation states to the molecules?
  19. Why hydrogen atoms are not included in resulting SDF files?
  20. Why do I get bond type 4 (aromatic) in my SDF output files?
  21. What does eHiTS do with chiral molecules?
  22. Why do I not get the ligand name in the output score file?
  23. Does the program give the RMS value in Docking results?
  24. How can I visualise the results of eHiTS?
  25. eHiTS on ubuntu Linux, is it not working there?
  26. Install says iteratively: "Package_Linux.bin: 32: cut_relative: not found". Can eHiTS (or any other SimBioSys package like CheVi, Lasso etc.) be installed on ubuntu Linux?
  27. License Expired?

Tune package related questions:

  1. If I don't use "-active" flag, how many complexes should I put into the list file for the tune package? If I describe 2 complexes in list file, Tune automatically train with only two actives and 400 decoys.
  2. If I use "-active", how many active compounds should I prepare at least? And that time, how many complex does Tune need in list file?
  3. Could Tune train pose validation like eHiTS 6.2? If Tune couldn't, is the purpose of Tune that improvement of enrichment?
  4. Please let me know the property of decoy compounds.

Known Issues:

  1. Why do I get charge changes in SDF output? (for eHiTS 6.2)
  2. Why the stand alone split application from eHiTS 2009.1 does not want to work for me?

Frequently Asked Questions

Q: Does the software predict the Active Site?

A: Predicting the active site may have two meanings:

  1. predicting where in the protein can the ligand dock, and
  2. predicting the exact geometry of a prespecified binding pocket.

Item (1) is discussed in the next reply, see Blind Docking related reply below.

Regarding item (2), this is certainly done in eHiTS. In this context we could discuss two main options to run eHiTS: the -complex keyword and the -ligand -receptor -clip combination. In the former case, the user lets eHiTS separate the ligand from the receptor, and eHiTS "clips" the protein around the found ligand. In the second option, the user specifies the general location of the binding pocket using the clip file, and eHiTS clips the protein around the coordinates supplied in that file. The clipping itself means that a box around the relevant coordinates will be created, and the search grid will be placed in that box. All the rest of the protein becomes obsolete for the remainder of the calculation. After clipping, eHiTS "floods" the clip box and determines the surface of the binding pocket, by detecting the interconnected cavities. This continuous cavity consitutes the binding pocket.

Q: If the protein is a hypothetical one, is there any provision to define the active/binding site or can it perform a blind docking?

A: eHiTS was not designed for binding pocket detection, but it actually does a good job in that respect. If the user has separate files for the ligand and for the receptor, then running eHiTS with the following command: ehits.sh -receptor protein_file -ligand ligand_file without using the '-clip' option, will invoke blind docking. eHiTS will attempt to bind the ligand everywhere in the protein an in most cases will find at least one possible binding site. We have saen cases where it detected correctly the main site, and secondary sites as well.

For more information about Bind Docking, see our 2009 Technical Note, or Dec 2009 blog posting.

Q: Do I need to add hydrogens to the protein?

A: No need to add hydrogens - eHiTS will do that

Q: Should I add hydrogens to the input ligand?

A: No need to add hydrogens to the ligand either - eHiTS will do that. But if the input DOES have already H atoms, then eHiTS will use the given POSITIONS, but it may switch the protonation states treating some of those as a lone pair rather than hydrogen.

This feature ("use H when given, generate when not given") allows the user more control. If the user has generated the H positions from reliable source (e.g. QM modelling, minimization), then it is better to use those. However, it is not worth using OpenBabel, Corina or other simple modelling tool to generate them, because eHiTS' internal knowledge base will do as good or better job and will definitely match better its own training that way.

Q: What does eHiTS do about protein flexibility?

A: We consider that eHiTS provides a soft representation of the receptor, because of the following three algorithmic solutions:

  1. The eHiTS scoring function takes advantage of the temperature factor information provide in the PDB files to give a more complete picture of the interaction. The program also uses the probability of the atom positions to create derived empirical scoring function.
  2. eHiTS rotates the -OH groups of Ser, Thr and Tyr residues of the protein and also the -NH3+ group of Lys. I.e. the interaction flexibility of these is considered. Note: we are not moving the heavy atoms of the main or side chains during the process.
  3. The steric clash, or van der Waals potential, is not considered with a hard 6-12 potential like typically in most force fields, but with a softer quadratic potential.

Q: Can eHiTS work with peptide ligands?

A: Yes, eHiTS can work with peptide ligands, we see very good results using peptides (we have a pharmaceutical partner that exclusively uses eHiTS for peptides). They have seen very good performance with 8-10 residues, although the more you add the longer the computation will take. eHiTS does not have a limit for the number of rigid fragments, but results will be useless beyond 12 or so fragments.

Q: Why eHiTS says: "cannot convert" for some of my peptide ligands in PDB file format?

A: This happens only with input ligands in PDB files, because we do not yet handle properly some of the end-residues (N-terminal portions of the peptide ligands) for  peptides  in PDB file format.  Problems can be circumvented by use of the peptide input ligand in MOL2 file format (e.g. by converting from PDB to MOL2 format using OpenBabel).


Q: Why does eHiTS give me an error when it tries to convert the PDB file?

A: If you used some molecular visualization software to save the PDB file...

The Protein Data Bank has a very strict file format definition for PDB files: http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html.  Unfortunately, most PDB files that are publicly available DO NOT follow the written standard of the format. In fact, several commercially available molecular visualization software DO NOT adhere to these PDB file conventions. Software such as Quanta do not include the "CONECT" keyword when it saves the PDB which is in fact essential to determining the correct connectivity of the atoms.  These softwares use simple distance based criteria from the coordinates to GUESS the connectivity of the atoms.  However, eHiTS requires the "CONECT" keyword to ensure the accuracy and integrity of the molecule.

So the take home message is:  if you use some molecular visualization software to manipulate the PDB file, DO NOT save it as a .pdb but instead save it in one of the other formats that eHiTS accepts:  .mol, .mol2, .sd


    If you downloaded the PDB file and used it as is without changing or saving in another program...

It is unfortunate that even some of the PDB files in the Data Bank violate their own standards, contain inconsistencies, errors in the file.  Some automated error correction is already implemented in eHiTS, but there can be error scenarios that we have not yet discovered.  For instance, there are some files in which the connections of the atoms are not correct.  This will inherently cause an error in eHiTS. However, this problem is currently being solved by our development team and will be announced as soon as it has been fixed.

Please inform us about the PDB code if you run into input file conversion problems with original PDB files from the Data Bank.

Q: eHiTS is giving strange results when I use PDB files generated with Sybyl, why?

A: Sybyl, as well as some other molecular visualization tools, do not always stick to the strict PBD standard formatting. In the case of Sybyl, it does not include the atom labels in columns 77-78, and therefore the output does not distinguish between atoms with ambiguous atom labels, such as alpha carbons (CA) and calcium (CA). In this case it is better to use the default MOL2 output from Sybyl as input to eHiTS.


Q: What kind of CHARGES should I save in Sybyl as MOL2 input file for eHiTS?

A: According to Tripos' MOL2 file format definition (from 2005) the charge type associated with a molecule in a mol2 file could have one of the following values:

NO_CHARGES, DEL_RE, GASTEIGER, GAST_HUCK, HUCKEL, PULLMAN, GAUSS80_CHARGES, AMPAC_CHARGES, MULLIKEN_CHARGES, DICT_ CHARGES, MMFF94_CHARGES, USER_CHARGES

In eHiTS we accept ALL of the above charge types, and consequently use the charge values coming from the mol2 input file, except for the type:

NO_CHARGES <- this is the charge type field of the mol2 file

In that case, we completely ignore all the charge values in the file. We have recently seen some input files coming from Sybyl 8.0, that looked like this:

USER_CHARGES <- this is the charge type
INVALID_CHARGES <- this is the INTERNAL status bit for Sybyl


These kind of files will be perceived incorrectly by eHiTS v9 series, because eHiTS will still use the charges, despite it's stated in the second line not to. To avoid this, either set the charge values correctly in your mol2 file or if you do not have such information about your system, use NO_CHARGES - and eHiTS will automatically calculate it for you.


Q: What input formats does eHiTS accept?

A: The following input file formats are supported:

  • MDL Molecular files (mol or sdf) - 3D only;

  • Protein Data Bank files (pdb);

  • Tripos Mol2 files (mol2) - 3D only;

  • Tagged Molecule Ascii (tma) - native eHiTS format;

  • Tagged Molecule Binary (tmb) - native eHiTS format.

Q: Do I need to convert my 2D ligands into 3D?

A: Yes, eHiTS works only with 3D ligand files. So, if your input is in 2D coordinate system, please convert it to 3D with a tool, such as Corina.


Q: How does eHiTS identify the file type?

A: The input file format is identified by the extension of the file name.  Some examples:

file.pdb - PDB file
file.mol - MDL Molecular file

Please DO NOT use "." in the file name because this will cause errors, e.g. file.name.tma

Q: How are cofactors, water and metal ions considered during docking?

A: There are two cases here to consider:

  1. if you use the "-receptor" option then ALL ATOMS of the receptor are treated equally, i.e. they all contribute to the steric grid to define the shape of the cavity, perception will assign activity to the atoms (H-bond, hydrophobicity etc.), surface points are generated to those that are at the cavity surface, the type of the surface points are based on the activity assigned in perception, docking will position fragments and score ligand interactions against them.
    So, in nutshell, it does not matter if an atom is part of a protein residue, cofactor, salt ion, metal ion, water molecule or anything else, they all perceived correctly (based on the connection table), properties assigned to the surface and as such fully participate in the docking as long as they are atoms of the receptor molecule.
  2. if you use the "-complex" option, to split a PDB file, we have a list of recognized co-factors, water and metals that will not be considered as ligand, but be kept as part of the receptor. For example, the residue IDs that are recognized as co-factors are: "NAD", "NAP", "NAG", "CNA", "NDP", "FAD", "FMN", "HE0", "TYS", "BTB", "COA", "MAN", "LMU", "PLP", "HEM", "BTN", "HEA", "HAS", "MES" . Any other co-factor name that is not listed here may be treated as "ligand" by split (depending on the size relative to the real ligand), which could potentially lead to incorrect splitting. However, this problem is completely bypassed if the -receptor option is used instead of the -complex.

Q: How does it handle Water Molecules? (Freely rotatable, displaceable or oriented?)

A: Water molecules in eHiTS are freely rotatable.

Q: Can eHiTS handle metal ions including metalloprotiens?

A: Yes. eHiTS performs well for a variety of situations involving metal-ions and metal ion chelating ligands. A: Water molecules in eHiTS are freely rotatable.

Q: Do I need to prepare the correct protonation state of the ligand?

A: No, it does not need to be prepared. eHiTS handles all possible protonation states of the receptor and ligand in a single run!

The issue of protonation state is very important to the docking problem.  Ligands and receptors with different protonation states can have dramtically different binding positions.  However, it is common practice for many docking programs to ignore this issue and require that the user define a particular protonation state prior to running a docking experiment.

This approach may be fine for a re-docking experiment where there is experimental data to help the user identify the correct protonation state.  However in many cases there is no way for the user to know this. 

Protonation states of ligands and receptors are determined by the interaction between the two.  Thus for any particular receptor-ligand pair there will generally be one correct protonation state (although there are cases where multiple valid docking poses exists for different protonation states of a particular receptor/ligand pair).  However for a different ligand, the protonation state of the receptor may be altered, to reflect the characteristics of the ligand.  If a docking program were to pre-set the protonation state of the receptor then possible interactions with a ligand could be lost.  A better solution, with a more appropriate score, can be found only if the program is run with different protonation states (not necessarily the neutral or the lowest energy form).

eHiTS takes a unique approach to the protonation problem. eHiTS systematically evaluates all possible protonation states for the receptor and ligands, automatically for every receptor-ligand pair. It does this through the use of ambiguous properties flags for postions that could be either protonated or deprotonated (i.e. have a lone pair). Then during the docking algorithm each state is evaluated and scored. The result is the only docking program that evaluates all possible protonation states for the receptor and ligand in a single run.

 

On a more practical level, this means that eHiTS may alter the protonation state of the input receptor and ligand to achieve the best possible binding score.  For example, if a user were to enter a molecule with a carboxilic acid group (as pictured above) in its neutral protonation state (left), depending on the receptor environment, eHiTS may output the deprotonated carboxilate form (right), as this the form often seen under physiological conditions.

For more info see the technical notes on the Automatic Protonation state handling in eHiTS

Q: Can I assign protonation states to the molecules?

A: Yes, you can.

Whenever the input files do not contain any hydrogen atoms, eHiTS will evaluate the protonation state on the fly as described in the answer to question the previous question. If the user provides a specific protonation state in the input files, the automatic protonation state handling mechanism will still be invoked as long as the command line argument "-fixproto" has not been used. However, even when -fixproto is not used, eHiTS will use the coordinates of the hydrogens in the input files as optional locations for protons, and will assess whether those are populated or not.

The user may wish to run the docking with the pre-assigned protonation state. In this case the user should add to the command line:

-fixproto ligand/receptor/both
where the user can choose whether to fix the protonation state for the entire system, or only for the ligand or the receptor.

Q: Why hydrogen atoms are not included in resulting SDF files?

A: In the current version of eHiTS, we are not outputting the protonation state that we are using in the scoring. The way we sample protonation is by using a local model only, no consideration for pH at all. If a location can be protonated or de-protonated, we score it as if it were either, then choose the protonation state that scores the best. Essentially, for a given pose we are giving it the "best score possible".

There are several problems when outputting the protonation states. One is that a protonation state is a property of the complex not just the ligand, therefore we would need to have the corresponding receptor information as well. We are working on incorporating this in CheVi (we will show both the receptor and ligand protonation).

The second, is that as you change protonation, tautomers could also change, and we currently do not have a mechanism of fixing the tautomers. We are currently finishing up a project to address this issue, and it should be in the next release.

So what we are currently doing is outputting (in default mode) either TMA's or SDF files with no hydrogens. We then let the user "assume" the appropriate protonation state. We know this is not the best solution.

Note: If you are using using the convert utility, after a docking run, the convert just sees the ligand, and doesn't have any information about the receptor or the scores, so it just puts Hydrogens everywhere (where they "normally" or "typically" are).


Q: Why do I get bond type 4 (aromatic) in my SDF output files?

A: The answer depends on your input file type. If the input was MDL's SDF or MOL file eHiTS will keep the same bond type as it was in the input file. If the input was MOL2 the same thing happens, with the exception that in MOL2 definition the bond type can be aromatic (i.e. "am" type), so if "am" bond was in the input MOL2 file, the output SDF file will have aromatic (i.e. "4") bond type. If the input was PDB file, eHiTS will perceive all the aromatic bonds and save them as type "4" into the output SDF file.

Note: if the aromatic bond type in SDF causes a problem for a tool that you use after eHiTS, there are programs (like MOE) which have the ability to readjust the aromatic bond type to single and double alternating bonds. Please contact us for more details.

Q: Does the program give the RMS value in Docking results?

A: Yes, eHiTS gives automatically the RMS values when used for self docking with the -complex option. If the receptor and ligand are provided separately, the user should use the -rms flag to get RMS values reported. The order of atoms in the ligand file should be identical to that in the rms file.

Q: What does eHiTS do with chiral molecules?

A: Chirality of molecules is not changed in eHiTS. The algorithm preserves the chirality of the rigid fragments, and handles it at join points. So, whatever was the input chirality is preserved in the output as well.


Q: Why do I not get the ligand name in the output score file?

A: The answer again depends on your ligand input file type. If the input was Tripos' MOL2 file, the name is automatically detected and included into the scores.txt and best_scores.txt files.

If the input ligand file was MDL's SDF or MOL file eHiTS needs to be notified which tag name is the right one to be used for the name. I.e. the syntax is:

ehits.h [your regular parameters] -tagname TAG_NAME
where the "TAG_NAME" is the label (or tag) which identifies the molecule's name in sdf file, and it must be between quotation marks if it has spaces.


Q: How can I visualise the results of eHiTS?
A: With our own viewer - FREE for everyone:

  • CheVi® (Chemical Visualiser)
    Available for Linux only for now, Web-plugin and other platform support is coming soon.

With other tools:
Note: make sure that you use the "-out myresults.sdf" command line argument to produce an output file that can be viewed by most of the standard molecular modelling programs. Once you have the "myresults.sdf" file, you can use the most standard 3D viewing programs:

MarvinView from ChemAxon; Pymol, CACTVS; MOE - from CCG (Note: when opening eHiTS output SDF files in MOE you get one database entry for each result along with its eHiTS pose and score); Insight II - from Accelrys (Note: when opening eHiTS output files in Insight II, you get all the structures, but no data, i.e. no score and no pose number); Maestro - from Schrodinger; Sybyl - from Tripos (Note: select MACCS as the type of file to be opened).


Q: eHiTS on ubuntu Linux, why is it not working there?
A: if you get a strange message something like:

1246: Syntax error. Bad substitution.

This is a problem with the latest ubuntu v 6.10 and above, where the distributors of this Linux distribution have changed the default shell to "dash" instead of bash and it's not compatible with bash. Try running eHiTS with:
/bin/bash INSTALL_PATH/ehits.sh

and that should solve the problem.


Q: Install says iteratively: "Package_Linux.bin: 32: cut_relative: not found". Can eHiTS (or any other SimBioSys package like CheVi, Lasso etc.) be installed on ubuntu Linux?
A: Yes, the problem is with "ubuntu" not using bash as the default shell. so please try installing the SimBioSys software package with:

[path_to_bash] [SimBioSys_package.bin] [WHERE_TO_INSTALL]
e.g.

/bin/bash /home/user/Download/CheVi_9.0_Linux.bin /home/user/SimBioSys/

and that should solve the problem.


Q: License Expired?
A: If a user has his license extended by SimBioSys support staff, but does not run eHiTS prior to the original expiry date, then eHiTS will think the license has expired. The user will get the following message:

WARNING! Your license will expire in -X days.
If you wish to use the software after the expiry, you need to contact
SimBioSys Inc. to request an extension, e.g. email support@simbiosys.com

To solve this problem, you must delete the grant file found in your ehits_work/license directory:

rm ~/ehits_work/license/*.grant


Tune package related questions


Q: If I don't use "-active" flag, how many complexes should I put into the list file for the tune package? If I describe 2 complexes in list file, Tune automatically train with only two actives and 400 decoys.
A: In principle, one should use as many complexes as possible. Two complexes is not a sufficient set for tuning because the data is split to a set that is used for training and a set that is used for validation. You can look at the receptors.rkba file and see that most families have around 10 PDB codes, and I believe 5 or 6 complexes is roughly the minimum required to have a sensible tuned weight set. The ligands in the complexes are used as actives and with an automatically selected decoy set they are used to rescale the score with a LASSO-like scheme.


Q: If I use "-active", how many active compounds should I prepare at least? And that time, how many complex does Tune need in list file?
A: The supplied list of actives supplements the actives in the complexes. The structural information is used to tune the relative weights of the various terms in the scoring function so that more faithful (low RMSD) poses get better scores. The actives are used to rescale the entire score based on a LASSO-like filter. In principle, the more actives, the better. But it is also important to have a diverse set of actives, and there is little use in adding ligands that are just small variations of others on the list.


Q: Could Tune train pose validation like eHiTS 6.2? If Tune couldn't, is the purpose of Tune that improvement of enrichment?
A: eHiTS 6.2 had facilities to carry out deeper tuning that affected PoseMatch and DockOptim. Those tuning modes have proved to have little effect, and on the other hand they were very expensive computationally. The tuning in eHiTS 2009 is designed to generate a better score differentiation between good and bad poses. Before and after tuning eHiTS will generate the same poses, but after tuning it will be able to give the better poses lower scores. In addition, the LASSO rescaling improves the enrichment capabilities, and once again, this is done through scoring and not through different pose generation.


Q: Please let me know the property of decoy compounds.
A: The decoy compounds in the Tuning package, as in LASSO 2009, are chosen automatically from a set of decoys that was assembled from various sources, such as the DUD set. In each tuning run, a subset of decoys is chosen such that it forms as diverse set as possible in terms of the LASSO descriptors, and such that it does not overlap with the set of actives, since there is always the risk that a decoy from the set may in truth have some activity.


Known Issues


Q: Why do I get charge changes in SDF output for multi-ligand runs? (in eHiTS 6.2)

A: There is a known bug in eHiTS version 6.2 that can change the charge of Nitrogens or Oxygens in the SDF output of a multiple ligand screen if you are using the "-out myresults.sdf" command line option. The problem is due to a mis-calculation of the partial charges in the final writing of the output file. The problem does not affect ligand pose or score, as it happens only during the writing of the final SDF file.

Workaround: The current work around is to run eHiTS multi-ligand runs without using the "-out myresults.sdf" flag and to look for the results in the file:
$HOME/ehits_work/results/<RECEPTOR>/<LIGAND>/ehits_best.sdf.
This file does not contain the bug.
Single ligand docking runs are not affected by this bug at all.

Note: if you have run a multi-ligand run without reading this first, we do have scripts to correct the problem without re-running the job. Please contact us for more details.


Q: Why the stand alone "split" application from eHiTS 2009.1 does not want to work for me?

A: There is command line argument "-config path/parameters.cfg" which must be specified, however it was missed from the help usage text of the application. Thus the correct syntax for the stand alone "split" application is:

Usage:

PATH/eHiTS_2009.1/Linux/bin/split input_file_name -config PATH/eHiTS_2009.1/data/parameters.cfg [MARGIN] [-keep_water] [debug-options]


Back to the Top



[eHiTS Links]
[Related Products]
SimBioSys was BIO IT 2008 Expo's best-of-show award finalist

Copyright © 2010 SimBioSys Inc., All rights reserved.