The Monte-Carlo baseline

Search and optimisation techniques such as genetic algorithms (GA), simulated annealing (SA), and variants of hill climbing are suited to different kinds of problems. Each exploits properties of the underlying landscape of the evaluation function: hill climbing prefers continuous landscapes without local maxima, genetic algorithms suit problems where sets of attributes affect the evaluation function in near independent ways.

Note that many search methods using random choices could be described as "Monte Carlo". However, here Monte Carlo search refers to the extreme of simply randomly choosing states and keeping track of the best so far.

Where there is no underlying structure to the evaluation function, that is where it is effectively random, then an exhaustive search or a purely random Monte Carlo search is the best one can do. However, exhaustive search may perform very badly on certain kinds of evaluation function, for example, if the large values tend to be at the end of the search sequence. In contrast a Monte Carlo search will tend to perform equally well and equally badly on all types of evaluation function.

In addition, where the evaluation structure is very badly behaved many algorithms which include stochastic features reduce to Monte Carlo search. This means that for very poorly structured problems a pure Monte Carlo search is often the best choice as it is no worse than other algorithms in terms of algorithmic complexity and is a lot simpler.

Monte Carlo search can thus be seen as a sort of baseline with which to measure the effectiveness of other algorithms against different kinds of problem domain. For an algorithm A we can ask "how many steps would a Monte Carlo search have taken in order to get as good a solution that this has after n steps". For an ill fitting problem the answer would be around the same, but if the algorithm is well suited to the problem domain it would take far more Monte Carlo steps.

One way to do this would be to test an algorithm by performing a Monte Carlo search, recording progress at each iteration (best so far), and then comparing this against a run of the algorithm. However, Monte Carlo search has a known average performance and this can be used to produce a more precise measure.

First of all we need to normalise the evaluation function, by transforming it into a 0-1 range. To do this, we simply rank the states and give each state a 0-1 value depending on how far up the ranking it is: best gets 1, worst gets 0, median gets 0.5.

We can define this more formally as follows. Given a set of states S, an evaluation function eval and a state s, we define the normalised evaluation norm(s) as the proportion of states that have evaluation values less than or equal to eval(s).

norm(s) =	count { s' S s.t. eval(s') eval(s) }



	count(S)

For a given search algorithm, we can look at each step at the best state found so far. Typically this will increase quite rapidly at the beginning before tailing off either to 1 if the search finds the best state, or to something less than 1.

Monte Carlo search has the property that the average behaviour of this 'best so far' measure can be calculated and is in fact 1 - 1/(n+1) (where n is the number of steps).

So, for any other search method we can take the 'best so far' normalised value after i steps, say vi, then get the 'equivalent' number of Monte Carlo steps: n_i = 1 - 1/(v_i+1)

For any run of a Monte Carlo search this will approximate the straight line n_i ~ i. If a search is doing better than Monte Carlo it will rise faster, if worse then slower! For example, hill climbing searches will typically rise much faster initially and then 'flat line', whereas we would expect well suited problems for GAs or SA to continue to rise faster (but I have no idea whether that is simply a faster linear growth or has some non-linear shape).

http://www.hcibook.com/alan/topics/ai/monte-carlo-baseline.html

maintained by Alan Dix