Personal tools
You are here: Home FAQs & Help FAQs Which values of the genetic algorithm parameters do you normally use?
Document Actions

Which values of the genetic algorithm parameters do you normally use?

Up to table of contents

This FAQ applies to: AutoDock 3, AutoDock 4

When running a docking using the GA (genetic algorithm) or LGA (Lamarckian GA), there are a number of parameters to set. Which values do you normally use?
Here is part of a typical DPF (docking parameter file) for AutoDock:
ga_pop_size 150                      # number of individuals in population
ga_num_evals 25000000 # maximum number of energy evaluations
ga_num_generations 27000 # maximum number of generations
ga_elitism 1 # number of top individuals to survive to next generation
ga_mutation_rate 0.02 # rate of gene mutation
ga_crossover_rate 0.8 # rate of crossover
ga_window_size 10 #
ga_cauchy_alpha 0.0 # Alpha parameter of Cauchy distribution
ga_cauchy_beta 1.0 # Beta parameter Cauchy distribution
set_ga # set the above parameters for GA or LGA
sw_max_its 300 # iterations of Solis & Wets local search
sw_max_succ 4 # consecutive successes before changing rho
sw_max_fail 4 # consecutive failures before changing rho
sw_rho 1.0 # size of local search space to sample
sw_lb_rho 0.01 # lower bound on rho
ls_search_freq 0.06 # probability of performing local search on individual
set_sw1 # set the above Solis & Wets parameters
The parameters that begin with 'ga_' control the genetic algorithm, while the parameters that begin with 'sw_' control the Solis and Wets local search method.  This block of parameters, along with the "set_ga" and "set_sw1" commands, tells AutoDock to run a hybrid global-local search, i.e. Lamarckian GA.

Which parameters are the most important?


The  parameters that control how long the GA and LGA runs are 'ga_num_evals' and 'ga_num_generations'.  AutoDock stops a docking if either the maximum number of evaluations or the maximum number of generations is reached, whichever comes first.  In this case, the docking would terminate based on reaching the maximum number of energy evaluations, namely 25 million evals, since there are fewer than 27000 generations in these runs.  An energy evaluation is performed every time the GA or the local search computes the fitness of a candidate docking.  If there is a population of 150, as specified by the 'ga_pop_size' parameter, then every generation, there will be 150 energy evaluations to compute the fitness of all the members of the population; if there is any local search, then the proportion of the population set by the 'ls_search_freq' parameter will undergo local searches.  Here, the local search frequency is set to 0.06, so 6% of 150 individuals, or 9 individuals, will undergo local search.  In this example the number of local search iterations is set to 300, using the 'sw_max_its' parameter, so each of these 9 local searches could consume up to 300 energy evaluations each.  Note that the Solis and Wets local search method changes the step size during the search, and it will terminate if the current step size becomes smaller than 'sw_lb_rho', which here is set to 0.01; it will also terminate if the maximum number of iterations, 'sw_max_its', is exceeded, whichever condition is reached first.

The number of energy evaluations needed for a docking will depend on the number of torsions in the ligand (and receptor, if it is flexible).  For rigid ligands and rigid receptors, here are some general guidelines:

Number of Torsions ga_num_evals ga_num_generations
0 25 000  to  250 000 27 000
1-10 250 000  to  25 000 000 27 000
>10 >25 000 000 27 000

There are some AutoDock users who prefer to set 'ga_num_evals' to a very large number and then set the 'ga_num_generations' parameter to a number in the range of 500 to 1000.  There are no hard-and-fast rules here, and it is well worth trying a few variations of parameters on your own docking problem before settling on your best values.

It is worth noting that Hetenyi et al. showed that for the same docking, keeping everything else constant, increasing  'ga_pop_size' from 50 to 300 in steps of 50, that they got the most robust docking results with a population size of 300.  You may want to increase the default from 150 to 300, and see if you get better docking results.

How many dockings should I run?


The more dockings you do, the better your statistics and clustering are likely to be.  We recommend you run at least 50 dockings, specified by the  'ga_run' parameter. Make sure that each AutoDock process starts with different random number generator (RNG) seeds.  If you use the default 'seed pid time', the RNG will be seeded with the current AutoDock process ID and the number of seconds since 0 hours, 0 minutes, 0 seconds, January 1, 1970, Coordinated Universal Time, without including leap seconds.

References

Hetenyi, C. and van der Spoel, D. (2002) Efficient docking of peptides to proteins without prior knowledge of the binding site. Protein Science, 11(7): 1729-1737.

see also:

How many dockings and energy evaluations should I use for each compound?
How much computational time should be invested in each compound?
by morris last modified 2008-03-06 19:58

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: