Personal tools
You are here: Home FAQs & Help FAQs
Document Actions

FAQs

View entire FAQ in full Up to table of contents
Frequently Asked Questions (& Answers!)

Analysing Results

How do I visualise the docking results in the AutoDock log file?

I have finished my docking with AutoDock, and I have a DLG (docking log file): how do I visualise the results?

ADT

  • We recommend you download ADT (short for "AutoDockTools") which we have developed just for this purpose. ADT is part of the MGLTools and it runs on Windows, Linux and Mac OS X. (You need X11 on Mac OS X). Use the menu option "Analyze > Dockings > Open..." for one DLG, or "Analyze > Dockings > Open All..." for many DLGs containing docking results of the same ligand and target, as you might obtain from running a docking in parallel on a cluster.

This FAQ applies to: AutoDock 3, AutoDock 4

How do I evaluate AutoDock's clustering results?

If I have more than one cluster after doing conformational cluster analysis, which one do I choose?

Open a DLG (docking log) in a text editor and search for the word "HISTOGRAM", and if you used the "analysis" keyword in your DPF (docking parameter file), then you will find AutoDock's conformational clustering histogram. This sorts the docking results into conformationally similar bins, according to the RMSD tolerance you set using the rmstol keyword, and according to whether you used the rmsnosym command. (By default, AutoDock tries to compute the minimum RMSD by taking into consideration the symmetry in the molecule, and works well if the two conformations are very similar; using the rmsnosym command guarantees that a 1-to-1 correspondence of atoms is considered in computing the RMSD. ADT does not consider symmetry in the RMSD calculations in clustering, and uses the same algorithm as the rmsnosym command.)

NOTE: by default, AutoDock 4 uses only the ligand atoms for the cluster analysis, if you have sidechains that are flexible in the receptor . You can use the new command rmsatoms all to include all the moving atoms in the RMSD calculation; the alternative form of the command, rmsatoms ligand_only computes the RMSD for only the atoms in the ligand, although this command is not necessary since this is the default.

If you find more than one cluster, which one should you choose?

The answer depends on a number of factors: first of all, it's best to have done at least 50 runs, to get a good sampling of results to cluster (I prefer to do at least 100 dockings). Also, use an RMSD tolerance that is appropriate for the size of your ligand: larger ligands need larger rmstol values, typically at least 2 Angstroms.

The next question is, did each docking search for long enough? In other words, did the number of energy evaluations (ga_num_evals in GA and LGA dockings; cycles, accs and rejs in SA dockings) match the dimensionality of the search problem? This depends on the number of torsions in the ligand (and protein, if flexible), and how these torsions are arranged in the molecule (are they arranged linearly, or are they nested?). Ideally, if you run a docking for long enough, you should always converge on the lowest energy solution, and obtain just one cluster.

However, if you obtain two or more clusters, and the lowest-energy cluster is less populated than another cluster with higher energy, which one is the "right" answer? What happens to the clustering results if you increase the number energy evaluations? Does the size of the lowest energy cluster increase to exceed the number in the other cluster?

Which cluster you choose should also depend on a visual inspection of the binding modes, comparing how the ligand interacts with the receptor. Does one binding mode look more chemically-reasonable than the other(s)?

Also bear in mind that the if the difference in the energies between the mean energies of the two clusters is less than about 2.5 kcal/mol, this is within the standard deviation of the AutoDock force field, and it is difficult to say which one is the "correct" one.

If you have two ligands, and they bind to the same receptor, but one forms just one cluster, while the other forms more than one cluster, yet they both bind with about the same estimated binding free energy, which one will be better? This is a key question that we are currently investigating ways to quantify: stay tuned!

This FAQ applies to: AutoDock 3, AutoDock 4

I used "get-dockings" to extract the docked conformations. Where is the macromolecule?

I used "get-dockings" to extract the docked conformations. Where is the macromolecule?

After you extract the docked conformations from the AutoDock log file using get-dockings, you obtain a PDB formatted file that contains the docked conformations of the ligand. These are sorted in order of increasing energy, and in accordance with the conformational clustering. This PDB file puts each docked conformation in between MODEL and ENDMDL records. This file does not contain the macromolecule coordinates which you docked to. Don't Panic! These are still in the original PDBQS file you used to generate the AutoGrid maps. In other words, the output docked coordinates of the ligand are written in the same reference frame as the original macromolecule PDBQS.

To view the dockings in relation to the macromolecule, in InsightII 2000, for example, read in the PDB file of MODELs first (with "Keep all frames" turned on). Then read in the macromolecule PDBQS file using the first docked conformation as the "Reference" structure. You can read in the PDBQS file as a PDB formatted file (it works in InsightII, except you do not see PDBQS files in the file browser: you must type in the PDBQS file name yourself.) You should now see the macromolecule and the docked conformations together.

Alternatively, you can add the macromolecule and the docked conformations together into one PDB file that contains everything. At the UNIX prompt, type this:

  % get-dockings mydocking.dlg 
  % pdbqtopdb mymacromolecule.pdbqs >> mydocking.dlg.pdb

This will append the macromolecule (in PDB format) to the stacked MODELs in the "mydocking.dlg.pdb" file, so you can now read in this file and you should see everything together.

I get very high Reference RMSD values in my DLG; what went wrong?

I get very high Reference RMSD values in my DLG (docking log file); what went wrong?

The "Reference RMSD" values that are printed in the "RMSD TABLE" in the DLG are computed from the coordinates of

  • either the input ligand (PDBQ or PDBQT) file specified by the "move" command in the DPF, if you did not include the "rmsref" command in your DPF;
  • or the ligand (PDBQ or PDBQT) file you specified in the "rmsref" command in the DPF.

If you do not specify the "rmsref" command, and the ligand input coordinates happen to be translated far from the receptor, you will appear to get high Reference RMSD values. _Don't Panic!_ This is normal, and you don't need to worry about these high values.

You may want to check that the input ligand coordinates are far from the crystallographically observed binding position in active site, by reading in the input ligand and the receptor in ADT.

We usually use the "rmsref" DPF-command to specify the x-ray crystallographic coordinates of a known binding mode taken from a complex PDB structure. This can be a useful way of checking if your redocking is successful, if the Reference RMSD values are less than 2-3 Å from the crystal structure position of the ligand.

This FAQ applies to: AutoDock 3, AutoDock 4

Is there a way to save a protein-ligand complex as a PDB file in AutoDock?

I have completed my dockings of a ligand to a protein. Is there a way to save a protein-ligand complex as a PDB file in AutoDock?

Extracting Dockings from DLG Files

AutoDock 4 writes out the coordinates of the atoms in the ligand (and any moving parts of the receptor, if you are doing a flexible sidechain docking). It does not write out the coordinates of the fixed part of the receptor. Each docked conformation is written in PDBQT format in the DLG (docking log file). (Note that AutoDock 3 writes in PDBQ format). So if you did 10 dockings, there should be 10 different docked conformations in the DLG.

Each line of the PDBQT-formatted docked conformation is prefixed by the string "DOCKED: ", so it is possible to extract these lines from the DLG using a couple of simple UNIX commands. You need to go to the UNIX/Linux/Cygwin/Mac OS X Terminal and change directory (cd) to the directory that contains your DLG, then type the following line at the command line and press < Return > (substitute my_docking.dlg with the name of your DLG):

    grep '^DOCKED' my_docking.dlg | cut -c9- > my_docking.pdbqt

The grep command is a UNIX command that prints lines that match a pattern. Here, the pattern is ^DOCKED, and the ^ or caret symbol (Shift-6 on most keyboards) means "at the start of the line", so this pattern matches all lines that begin with the prefix "DOCKED". The | character (Shift-\ or Shift-backslash) is called a pipe, and it takes the output of the command on its left and feeds it into the input of the command on the right. The cut command selects portions of each line, and the flag -c9- means "cut out all the characters after column 9', which has the effect of removing the "DOCKED: " prefix from the line. The last part of the command, > my_docking.pdbqt, uses the > redirect command (Shift-. = greater-than symbol) to save the output into a new file called my_docking.pdbqt.

Converting from PDBQT to PDB

To convert from PDBQT format to PDB format, the simplest thing to do is to remove the charge (Q) and atom type (T) columns; this can be achieved using a simple UNIX command. Make sure you are in the same directory where you created my_docking.pdbqt, and type:

    cut -c-66 my_docking.pdbqt > my_docking.pdb

This will create a PDB file, containing all of the docking results. Each docking will appear as a single "MODEL", which is the PDB record usually used to denote an NMR model. Each "MODEL" will contain the ligand and any moving parts of the receptor. If you would like to view the models in this PDB file, you can go load the multi-model PDB file in a program like "PyMol"http://pymol.sourceforge.net/ and then click on the "Play" button to play through all the docked conformations. Click the "Stop" button to halt the play-back, and click on the ">" and "<" buttons to step through the conformations one-at-a-time. It is possible to load the PDB file of the receptor, too, to see how the ligand interacts.

Splitting a Multi-Model PDB File into Separate PDB Files

If you want to split the PDB file that contains all the docked conformations, my_docking.pdb, into separate PDB files each containing just one docking, then use these commands:

    set a=`grep ENDMDL my_docking.pdb | wc -l`
    set b=`expr $a - 2`
    csplit -k -s -n 3 -f my_docking. mydocking.pdb '/^ENDMDL/+1' '{'$b'}'
    foreach f ( mydocking.[0-9][0-9][0-9] )
      mv $f $f.pdb
    end

For example, if there were 50 ENDMDL records in the file my_docking.pdb, these commands would create 50 separate PDB files numbered from 000 to 049, and they would be named my_docking.000.pdb, my_docking.001.pdb, my_docking.002.pdb and so on, all the way up to my_docking.049.pdb.

Creating a PDB File of the Complex

To create a single PDB file that contains a complex of both the the receptor and all the models of the docked ligand, you can use the following command to combine the PDB file of the receptor (receptor.pdb) and all the docked conformations of the ligand stored in 'my_docking.pdb':

    cat receptor.pdb my_docking.pdb | grep -v '^END   ' | grep -v '^END$' > complex.pdb

This uses the UNIX cat command which concatenates files together. This command will create a new PDB file called complex.pdb.

To create a PDB file that contains a complex of both the receptor and a single ligand, then use the commands for splitting the multi-model docked PDB file, select the docked conformation of the ligand, and then use the following command to combine the PDB file of the receptor (receptor.pdb) and the docked conformation of the ligand; say we chose the ligand conformation in 'my_docking.042.pdb':

    cat receptor.pdb my_docking.042.pdb | grep -v '^END   ' | grep -v '^END$' > complex.pdb

This uses the UNIX cat command which concatenates files together, and the grep command with the -v flag which extracts all the lines except lines containing the END record. This command then creates a new PDB file called complex.pdb, that contains the coordinates of the receptor, followed by all the models of the docked ligand stored in the PDB file my_docking.pdb.

This FAQ applies to: AutoDock 4

What should I look for when I visualize a docked compound?

What features should I look for when I visualize a docked compound?

The first thing to check is that the ligand fits into some kind of pocket on the receptor. The second is that there is a chemical match between the atoms in the ligand and those in the receptor. For example, check that carbon atoms in the ligand are near hydrophobic atoms in the receptor, while nitrogens and oxygens in the ligand are near complementary hydrogen bonding atoms. Check for charge complementarity. Also consider whatever else you may know about your particular system: for instance, if you know the mechanism of action of the enzyme and which residues are involved, examine how the ligand binds to these residues. In the case of HIV-1 Protease, good inhibitors bind in a mode that mimics the transition state, placing a hydroxyl group near the two catalytic aspartic acid sidechains.

This FAQ applies to: AutoDock 3, AutoDock 4

In a flexible receptor docking in AutoDock 4, which atoms are used in the clustering?

When AutoDock 4 performs conformational clustering on the docking results, which atoms are used in the clustering?

In AutoDock 4, conformational clustering is performed after all the dockings have finished if the keyword analysis is given in the docking parameter file (DPF).

By default, only the atoms in the moving ligand (defined by the move keyword in the DPF) are used in the RMSD clustering calculations. There is a DPF keyword, rmsatoms that can take the argument all, that tells AutoDock 4 to include the atoms in the flexible residues in the receptor in the RMSD calculations for the clustering.

This FAQ applies to: AutoDock 4

Why do the results differ when multiple dockings are done with the same input?

Multiple docking calculations are specified in a docking parameter file using one of the following keywords: 'ga_run', 'do_local_only' or 'runs' plus 'simanneal'. In general, the results will differ between dockings when using the same input.

1. AutoDock uses a random number generator to create new poses for the ligand
during its search. 

2. The random number generator used by AutoDock produces a sequence of random
numbers based on two initial seeds. The new conformations for the search are
created using this sequence of random numbers to set location, orientation and
torsion values.

3. The default values for these two seeds are 'pid' and 'time'. Process id
and time vary between AutoDock calculations.

4. Therefore, the sequence of random numbers is different between different
AutoDock calculations. As a result, the 'search' is encountering a different
set of random conformations. Thus the results differ.

Please note, it is possible to specify the seeds explicitly. In this case,
multiple AutoDock calculations should give the same, albeit restricted, results.

A separate consideration is that convergence of the docked results is an
indication of the thoroughness of the search.  If the dockings in a single
AutoDock calculation vary greatly, possibly the the search hasn't been
thorough enough to find the minima in the energy landscape for a particular
docking problem.  In this case, the search may need to be extended by
increasing the number of energy evaluations or the number of generations.

This FAQ applies to: AutoDock 3, AutoDock 4

by morris last modified 2007-07-19 17:31
Contributors: Ruth Huey, Garrett M. Morris, Sargid Dallakyan, Stefano Forli

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: