How do I evaluate AutoDock's clustering results?Up to table of contents
This FAQ applies to: AutoDock 3, AutoDock 4
Open a DLG (docking log) in a text editor and search for the word "HISTOGRAM", and if you used the "analysis" keyword in your DPF (docking parameter file), then you will find AutoDock's conformational clustering histogram. This sorts the docking results into conformationally similar bins, according to the RMSD tolerance you set using the
rmstol keyword, and according to whether you used the
rmsnosym command. (By default, AutoDock tries to compute the minimum RMSD by taking into consideration the symmetry in the molecule, and works well if the two conformations are very similar; using the
rmsnosym command guarantees that a 1-to-1 correspondence of atoms is considered in computing the RMSD. ADT does not consider symmetry in the RMSD calculations in clustering, and uses the same algorithm as the
NOTE: by default, AutoDock 4 uses only the ligand atoms for the cluster analysis, if you have sidechains that are flexible in the receptor . You can use the new command
rmsatoms all to include all the moving atoms in the RMSD calculation; the alternative form of the command,
rmsatoms ligand_only computes the RMSD for only the atoms in the ligand, although this command is not necessary since this is the default.
If you find more than one cluster, which one should you choose?
The answer depends on a number of factors: first of all, it's best to have done at least 50 runs, to get a good sampling of results to cluster (I prefer to do at least 100 dockings). Also, use an RMSD tolerance that is appropriate for the size of your ligand: larger ligands need larger
rmstol values, typically at least 2 Angstroms.
The next question is, did each docking search for long enough? In other words, did the number of energy evaluations (
ga_num_evals in GA and LGA dockings;
rejs in SA dockings) match the dimensionality of the search problem? This depends on the number of torsions in the ligand (and protein, if flexible), and how these torsions are arranged in the molecule (are they arranged linearly, or are they nested?). Ideally, if you run a docking for long enough, you should always converge on the lowest energy solution, and obtain just one cluster.
However, if you obtain two or more clusters, and the lowest-energy cluster is less populated than another cluster with higher energy, which one is the "right" answer? What happens to the clustering results if you increase the number energy evaluations? Does the size of the lowest energy cluster increase to exceed the number in the other cluster?
Which cluster you choose should also depend on a visual inspection of the binding modes, comparing how the ligand interacts with the receptor. Does one binding mode look more chemically-reasonable than the other(s)?
Also bear in mind that the if the difference in the energies between the mean energies of the two clusters is less than about 2.5 kcal/mol, this is within the standard deviation of the AutoDock force field, and it is difficult to say which one is the "correct" one.
If you have two ligands, and they bind to the same receptor, but one forms just one cluster, while the other forms more than one cluster, yet they both bind with about the same estimated binding free energy, which one will be better? This is a key question that we are currently investigating ways to quantify: stay tuned!