Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Tuesday, June 23, 2009

Solving A Tetris Cube, Recursive Backtracking, Algorithm X, Oh My!

A few days ago I got my hands on one of these:




A Tetris Cube. Twelve pieces, and a box of 4 by 4 by 4. Your task: to put the pieces neatly in the box again. The box said there are more than 9000 solutions, but I knew nonetheless this wasn't going to be an easy task. I was going to solve it, "Without cheating!", I proclaimed proudly.

An hour or so later these little blocks were completely driving me crazy. Usually I'm not that bad with these kinds of puzzles, but this thing certainly proved to be very difficult. I decided to call it a day and would try again later.

The next day I tried again, I was making a little progress, but every time I came close a little piece stuck out just a tiny little bit! This was going nowhere. If I were to beat this devil's contraption, I would need to resort to other means.

"I'll just throw together a recursive function real quick, that should be easy enough, right? (And no, that is not cheating...)" How I came to regret those words. In my experience, this:
I'll just quickly find a solution with recursive backtracking.
Almost always turns into this:
Damn, damn, damn! Why is it so slow? How can I speed this thing up?
But first things first. There are a few ways you could solve this, but first, let's take a slight detour to explain backtracking (you may skip this if you already know all this, but if you do know all this, then you probably know the solution to the puzzle as well, or would handle things better than I did):

Detour: Recursive backtracking in general


Let's see what a recursive backtracking function generally looks like:

recursive_funtion(level) {
  for all possible ways for something {
    try it
    if solution valid {
      if solution complete {
        done, show solution
      }else{
        recursive_function(next_level)
      }
    }
    (revert this try)
  }
}


As you can see, this method consists of a level, which we traverse in each recursive call, and a set of local decisions we make at each level. Take a magic square, for example (image from the Wikipedia page):



With the above method, we would define our level as the position in the grid (ranging from 1 to 9, as there are 9 spaces to fill with a number), the decisions at each level would be a number ranging from 1 to 9, our function would be:

recursive_funtion(position) {
  for number from 1 to 9, not used elsewhere already {
    put this number on this position, make it unavailable
    if solution valid {
      if solution complete {
        done, show solution
      }else{
        recursive_function(next_position)
      }
    }
    (make this space blank again, and the number available)
  }
}


A few remarks:
(1) As each number can only be used once in the whole square, we must keep an array or set somewhere to keep track of the numbers used, which can be easily done in each programming language.
(2) Note that we can check mid-solution, we must not wait until we are at position 9 (the last position) to check if the solution is valid. In fact, we can speed things up a bit by implementing a few optimalizations. For example: after each row is done (at positions 3, 6 and 9), check if this row sums to 15, if it does not, there is no sense continuing towards deeper levels in our tree.

Speaking of trees, I hope you see how the combination of levels and decisions correspond to a tree. If you do not: here is a picture:



(Not all levels, lines and branches are shown, but it should give you a pretty clear idea.) Note that, if we picked e.g. the number 2 for position (level) 1, we cannot use it in the following levels, as shown in the tree.

Now let's see how the function will work, we start at the first level, with the first number:


This is a valid choice, so we can step down to the next level. In level 2, our first number we can pick is 2:


This is a valid choice, so we can step down to the next level. In level 3, we can pick 3, let's see what happens:


Since we now have a complete row, we can check if the sum 1+2+3=15. Is isn't, so it makes no sense to continue at this point. Look at the huge part of the tree we can "cut off", as shown by the red curved line.

We're still in level 3, we try the following number: 4. 1+2+4=7, which isn't 15. We can cut away this part as well. Number 5 gives 8, 6 gives 9, and so on until number 9 gives 1+2+9=12, all the branches have been cut. Note that we can make an important observation here: your optimizations should try to cut away from the tree as early as possible, that way, the solution space in which we have to search can be decreased dramatically. (This is also the reason why the hardest choices or levels are often put first in these trees.)

Now the algorithm "backtracks" back to level 2, and tries the following number there: 2:


And so on (the yellow numbers are the ones we've "done", or excluded in our optimization). This continues until we find a solution, e.g.:
Level 1: 2, level 2: 7 and level 3: 6 gives 15;
level 4: 9, level 5: 5 and level 6: 1 gives 15;
level 7: 4, level 8: 3 and level 9: 8 gives 15.

Back on track (no pun intended): recursive backtracking in this case


For the Tetris Cube, we have tree decision variables:
(1) the block used (there were 12 different blocks);
(2) the position were we put the block;
(3) the rotation of the block.

If we'd used position as the level, we have 64 different levels (4x4x4). If we were able to place any remaining block at this position, we could immediately use the next unoccupied space as the position in the next recursive call (since it makes no sense trying all the occupied positions first, since they won't validate anyway).

If we'd used block as the level, we have 12 different levels. If we are able to place this block at any available position in a specific rotation, we can move on the the next block. It seems a bit more sensible to use this method. This way, our tree will be smaller (less tall), and the decisions inside the function purely concern position and orientation in 3d-space. Our function now looks like:

recursive_funtion(block) {
  for all positions 1 to 64 {
    for all orientations 1 to 24 {
      put our block at this position in this orientation
      if solution valid {
        if solution complete {
          done, show solution
        }else{
          recursive_function(next_block)
        }
      }
      (remove this block)
    }
  }
}


There are still a few questions left we have to answer, mainly: the different rotations, and how we can validate a partial solution. But first: let's take a look at the different blocks and how we can encode them.

The different blocks


There are 12 blocks, colored in either red, blue or yellow (but the colors themselves mean nothing, they just add decoration). Since the 3d-modelling program Blender runs too slow on my laptop (Compositing enabled and such) I quickly whipped out a few lines of Povray code.


Did you know I can do animations too?

Now how do we encode them? Each block can be described by a set of tiny 1x1x1 sub-blocks placed along the x, y and z axes. I call the tiny block placed at (0,0,0) the pivot. Take the red L for example, this block can be described as a combination of five little blocks:
{(0,0,0); (1,0,0); (2,0,0); (0,1,0); (0,2,0)}
(0,0,0) is the corner-block and our pivot.

Another way of describing this block with another pivot point:
{(0,0,0); (1,0,0); (2,0,0); (2,1,0); (2,1,0)}

As you can see, it doesn't matter which sub-block we use as our pivot block or how we places our x,y and z axes. Since our function will try every position and every possible rotation, we try every combination anyway.

Now why are there 24 rotations? Every block can be placed facing the direction along: x positive, x negative, y positive, y negative, z positive and z negative, which gives 6 possible directions.
For every direction, we can rotate the block in 4 different ways (0°, 90°, 180° and 270°). 4x6=24. Easy.

Taking the description {(0,0,0); (1,0,0); (2,0,0); (0,1,0); (0,2,0)} and saying: put this block in position 34 facing y negative and rotated 90° now just becomes a matter of transforming and rotating objects in 3d. I won't describe the details here, it involves some matrix manipulations and such, but are really simple when rotating in the four degrees mentioned above. If you want to know more, take a look here or Google it.

In PHP, the language I was using (I know, I know - remember, I thought this would be a quick and easy job), a block description looks like:

// 1 1 1
// 1
// (1)
$blocks[] = array(array(0,0,0)
    ,array(0,1,0)
    ,array(0,2,0)
    ,array(1,2,0)
    ,array(2,2,0) );


Verifying partial solutions


Just as with our magic square, we can do a few checks after placing a few blocks, a few of them are necessary:
(1) a block may not overlap another block;
(2) a block may not go outside the bounds of the 4x4x4 block.

You could also get creative:
(3) a block may not be placed in such a way that we have an isolated empty space. This is, a space surrounded by occupied spaces (or by the bounds of our 4x4x4 cube).

Trying the program


We're done, I was exited, I would beat this thing!

The program now looked like this (you can't read the code - believe me, it's a good thing):



The program was running dog slow. It was doing lots of block-position-rotation combinations very fast, but it was still too slow. I thought I had made an error, and changed some parameters. I used a 2x2x2 bounding box with only three little pieces to fill it with. The program immediately displayed all the solutions (not that many of course). It was working fine, but couldn't handle the large solution space of a 4x4x4 box. That's the problem with recursive backtracking, a "magic square" (see above), with 9 spaces is easy enough, but one with 10 spaces is a lot harder, and one with 11 spaces is even harder then the previous increment (non-linear). It was back to the drawing board again.

The following day...


The next day I decided to Google around a bit. It turns out that someone else has already solved this puzzle and posted his method online here. The author seems to like solving all these puzzles. Most of his findings and description is exactly the same as what I found, so why did his method work. You can download his program (http://scottkurowski.com/tetriscube/TetrisCubeSolved.zip) and take a look at the source code yourself.

So why was his program working?
(1) C! It's a pity, but C is much, much faster than PHP, a factor we can't omit here.
(2) He doesn't use a backtracking recursive function, but the general method is the same (a bunch of goto's and iterating over all the possible permutations the place the blocks one by one). It does backtrack however.

It's a pretty and well-written program. I felt defeated. Both by the puzzle and by an other programmer. Since I didn't want to rewrite everything in C to see if a recursive method would prove successful there (I don't like C), I wanted to see if I could solve it using a "slow" language anyway: less brute-forcing, more being smart.

Searching for a better method


I recalled reading something about Dancing Links and how someone solved a Sudoku puzzle with them. "Sudoku and Tetris Cubes look alike", I thought. So I continued researching this.

First of all, we have "Algorithm X", an algorithm found by Donald Knuth for solving exact cover problems. Exact cover problems are problems in which every constraint is an "exactly one"-constraint. Knuth uses the Pentomino puzzle as an example (every piece should be used exactly once and every space must be overlapped by a piece exactly once).

Also - more good news - Algorithm X is recursive and backtracking, it basically optimizes the way the recursion is done (see the linked Wikipedia page above to see how the algorithm works, make sure you understand it before continuing, it's quite easy and Wikipedia does a really good job at explaining it.)

Dancing Links is a method to implement Algorithm X based on the fact that it is very easy to remove and re-add elements in a double linked list. Instead of using this, I first decided to try a simple Algorithm X implementation.

Basically, Algorithm X performs a few operations on a matrix in which each element is either zero (0) or one (1). A solution is then a set of rows so that in each column there is only one one (1).

Take the Pentomino puzzle for example (if you aren't familiar with the puzzle, take a look here). It is basically the same problem as our Tetris cube, except for the fact that it is in 2d instead of 3d. We construct a matrix as such (from Wikipedia):


There are two constraints:
(1) for each of the 12 pieces, there is the constraint that it must be placed exactly once. Name these constraints after the corresponding pentominoes: F I L P N T U V W X Y Z.
(2) for each of the 60 spaces, there is the constraint that it must be covered by a pentomino exactly once.

Thus there are 12+60=72 constraints in total.

The problem involves many choices, one for each way to place a pentomino on the board. It is convenient to consider each choice as a sets of 6 constraints: 1 constraint for the pentomino being placed and 5 constraints for the five squares where it is placed. For example:
{F, 12, 13, 21, 22, 32}
{F, 13, 14, 22, 23, 33}
{I, 11, 12, 13, 14, 15}
{L, 12, 22, 32, 42, 43}


One of many solutions of this exact cover problem is the following set of 12 choices:
{I, 11, 12, 13, 14, 15}
{N, 16, 26, 27, 37, 47}
{L, 17, 18, 28, 38, 48}
{U, 21, 22, 31, 41, 42}
{X, 23, 32, 33, 34, 43}
{W, 24, 25, 35, 36, 46}
{P, 51, 52, 53, 62, 63}
{F, 56, 64, 65, 66, 75}
{Z, 57, 58, 67, 76, 77}
{T, 61, 71, 72, 73, 81}
{V, 68, 78, 86, 87, 88}
{Y, 74, 82, 83, 84, 85}



So, in our case, we need a row for every choice we can make, that is: all the combinations of every piece and every possible way of placing it. Our finds every valid possible combination for every block and constructs rows in the matrix:

for every block {
  for every position {
    for every rotation {
      try this, if this placement is possible (thus if it does not exceed the boundaries of the 4x4x4 box, add it as a row: {BLOCK; pos1, pos2, pos3,...}
    }
  }
}


Now our columns: we have 12 blocks, and 64 positions these blocks can occupy. 12+64=76. For every row, we:
(1) set a one (1) in one of the first twelve columns according to the block this row uses;
(2) set a one (1) in every "position" this row occupies.

After quickly throwing some ugly code together, I ran the program. First, it constructs the choices and the matrix:

Constructing choices:
0: ................................................................
1: ................................................................
2: ................................................................
3: ................................................................
4: ................................................................
5: ................................................................
6: ................................................................
7: ................................................................
8: ................................................................
9: ................................................................
10: ................................................................
11: ................................................................
Number of choices/rows: 4488 out of 18432 theoretical possibles choices
Constructing matrix: .................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Done, number of rows: 4488 and cols: 76


The matrix itself is a bit large to display here. But here are the first few rows:

OUTPUTTING MATRIX:
0 {0; 0; 1; 2; 6; 7; }: 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 {0; 0; 1; 2; 18; 19; }: 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 {0; 0; 4; 8; 9; 13; }: 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 {0; 0; 16; 32; 33; 49; }: 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 {0; 0; 4; 8; 24; 28; }: 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 {0; 0; 16; 32; 36; 52; }: 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
6 {0; 1; 5; 9; 10; 14; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 {0; 1; 17; 33; 34; 50; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
8 {0; 1; 5; 9; 8; 12; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 {0; 1; 17; 33; 32; 48; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 {0; 1; 5; 9; 25; 29; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 {0; 1; 17; 33; 37; 53; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
12 {0; 2; 6; 10; 11; 15; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 {0; 2; 18; 34; 35; 51; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
14 {0; 2; 6; 10; 9; 13; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 {0; 2; 18; 34; 33; 49; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 {0; 2; 6; 10; 26; 30; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 {0; 2; 18; 34; 38; 54; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
18 {0; 3; 2; 1; 5; 4; }: 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...


As you can see, in these rows, the first block (number zero) is used, so the first column is one (1), the next eleven columns are zero (0). The next 64 columns are one if the row uses that position.

The program then gives some verbose information:
==> (0) Number of rows: 4488 and cols: 76
Picking 12, has 117
117 rows to do for column
==> (1) Number of rows: 3414 and cols: 70
Picking 15, has 22
22 rows to do for column
==> (2) Number of rows: 2025 and cols: 64
Picking 17, has 45
45 rows to do for column
==> (3) Number of rows: 1156 and cols: 57
Picking 26, has 10
10 rows to do for column
...
==> (10) Number of rows: 9 and cols: 12
Picking 4, has 3
3 rows to do for column
==> (11) Number of rows: 0 and cols: 0
<== Column has zero ones. ==> (11) Number of rows: 0 and cols: 0
<== Column has zero ones. ==> (11) Number of rows: 0 and cols: 0
<== Column has zero ones. ==> (10) Number of rows: 10 and cols: 12
Picking 4, has 2
2 rows to do for column
==> (11) Number of rows: 0 and cols: 0
<== Column has zero ones. ==> (11) Number of rows: 2 and cols: 6
Picking 10, has 2
2 rows to do for column
==> (12) Number of rows: 0 and cols: 0


And finally, a first solution is found, alongside a simple presentation to try it on the cube:

All done, here is a list of choices:
CHOICE NR 0: 0 0 1 2 6 7
CHOICE NR 393: 1 19 3 23 22 21
CHOICE NR 741: 2 5 9 13 29 30 45
CHOICE NR 1113: 3 15 31 14 11 10
CHOICE NR 2693: 7 4 8 12 28 44
CHOICE NR 2371: 5 63 62 46 61 60
CHOICE NR 3328: 8 58 47 42 43 39 27
CHOICE NR 3621: 9 38 51 54 55 59
CHOICE NR 4427: 11 49 53 50 34 35 18
CHOICE NR 2635: 6 56 57 40 24 20 16
CHOICE NR 2004: 4 48 52 32 33 17
CHOICE NR 3938: 10 26 25 41 37 36

Solution found:
Plane 0:
3 3 2 7
3 3 2 7
0 0 2 7
1 0 0 0
Plane 1:
3 2 2 7
8 10 10 6
1 1 1 6
1 11 4 6
Plane 2:
8 5 2 7
8 8 10 6
8 9 10 10
11 11 4 4
Plane 3:
5 5 5 5
9 8 6 6
9 9 11 4
9 11 11 4


Which looks like this:



Ha, gotcha! If we would use Dancing Links as well, and a faster/better suited language, this would go even faster.

Thursday, February 05, 2009

Modern Genetic (And Other) Algorithms Explained: Part 7 - Conclusion

(This is the final part in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

In the past series, we have looked at genetic algorithms, simulated annealing, ant colony simulation and tabu search. Each method had its pro's and cons, and there certainly is room for exploration left.

I hope you've enjoyed this series, or that it has at least sparked your interest in the topic.

Here are a few more (general resources), if you want to know more:

Other related techniques:
Books:
A specialized event was held in the USA: Foundations Of Genetic Algorithms. You can find a lot of interesting information on their website.
Finally, I want to mention a software project which looks promising: Paradiseo:
A white-box object-oriented framework, portable (Windows, Unix and MacOS), dedicated to the flexible design of metaheuristics: solution-based metaheuristics (Local search, Simulated annealing, Iterated local search, Tabu search, ...) and population-based metaheuristics (Genetic algorithm, Particle swarm optimization, Evolution strategy, Differential evolution algorithms, ...).

-----
Table Of Contents (click a link to jump to that post)

Thursday, January 29, 2009

Modern Genetic (And Other) Algorithms Explained: Part 6 - Tabu Search

(This is part 6 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

Tabu search is not really "modern" anymore, but still widely used nonetheless.

The pseudocode looks like this:

Construct an initial solution
Until timelimit hit or satisfiable solution found do:
    Find all neighbours which are not in the tabu list, and calculate their score
    Pick the best neighbour, add the previous solution to the tabu list

Notice that the "best neighbour" must not be better than the current solution, or than the ever best found solution.

Maintaining a tabu list can be time and memory consuming, alternatively we could construct a list like this: add the difference between two following solutions to the list (so that they cannot be undone), and keep it in the list for n iterations. N, or the length of the list is important: make it too long and the algorithm might get stuck, make it too short and the algorithm will tend towards local maxima.

This time, I've coded the example in PHP, I hope nobody minds:

$target = 300;

//construct an initial solution
$tabulist = array('ttl' => array(), 'change' => array());
$solution = array();
$min = 0;
$max = 100;
for ($i=0;$i<6;$i++)
    $solution[] = rand($min,$max);
  
//until solution found
while (true){
    $best_neighbour_solution = false;
    $best_neighbour_score = 1000;
    $best_neighbour_tabu = false;
    for ($position=0;$position<6;$position++){
        if (!in_array("$position+",$tabulist['change'])
                and $solution[$position] < $max){
            $temp_solution = $solution;
            $temp_solution[$position]++;
            $score = fitness($temp_solution,$target);
            if ($score < $best_neighbour_score){
                $best_neighbour_score = $score;
                $best_neighbour_solution = $temp_solution;
                //make sure this step doesn't get undone
                $best_neighbour_tabu = "$position-";
            }
        }
        if(!in_array("$position-",$tabulist['change'])
                and $solution[$position] > $min){
            $temp_solution = $solution;
            $temp_solution[$position]--;
            $score = fitness($temp_solution,$target);
            if ($score < $best_neighbour_score){
                $best_neighbour_score = $score;
                $best_neighbour_solution = $temp_solution;
                //make sure this step doesn't get undone
                $best_neighbour_tabu = "$position+";
            }
        }
    }
    //pick the best neighbour
    $solution = $best_neighbour_solution;
    $fitness = fitness($solution,$target);
    echo "Iteration $iterations: fitness = $fitness\n";
    if ($fitness == 0){
        echo "Perfect solution found:\n";
        print_r($solution);
        die;
    }
    //add change to tabu list
    $tabulist['ttl'][$iterations] = 5; //remember for 5 iteration
    $tabulist['change'][$iterations] = $best_neighbour_tabu;
    //update the tabulist
    foreach ($tabulist['ttl'] as $key => &$val){
        $val--;
        if ($val <= 0){
            unset($tabulist['ttl'][$key]);
            unset($tabulist['change'][$key]);
        }
    }
    echo "Iteration $iterations: tabulist now contains "
        .count($tabulist['ttl'])." items \n";
    $iterations++;
}


function fitness($array, $target){
    return abs(array_sum($array)-$target);
}

The neighbour calculation could be done a little better (there's a bit of ugly duplicate code), but the following output is given:

Iteration : fitness = 57
Iteration : tabulist now contains 1 items
Iteration 1: fitness = 56
Iteration 1: tabulist now contains 2 items
Iteration 2: fitness = 55
Iteration 2: tabulist now contains 3 items
Iteration 3: fitness = 54
Iteration 3: tabulist now contains 4 items
Iteration 4: fitness = 53
Iteration 4: tabulist now contains 4 items
Iteration 5: fitness = 52
Iteration 5: tabulist now contains 4 items
...
Iteration 55: tabulist now contains 4 items
Iteration 56: fitness = 1
Iteration 56: tabulist now contains 4 items
Iteration 57: fitness = 0
Perfect solution found:
Array
(
    [0] => 66
    [1] => 14
    [2] => 20
    [3] => 99
    [4] => 14
    [5] => 87
)

With this sample problem, such an output was, of course, expected. Notice that I've used the second method of keeping a tabulist.

This post concludes this series, we only have one post to go, with a general conclusion.



-----
Table Of Contents (click a link to jump to that post)

Thursday, January 22, 2009

Modern Genetic (And Other) Algorithms Explained: Part 5 - Ant Colony Optimization

(This is part 5 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

First, let's go to Wikipedia for a definition:
The ant colony optimization algorithm (ACO), is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs.

This algorithm is a member of Ant colony algorithms family, in Swarm intelligence methods, and it constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD thesis [1] [2] , the first algorithm was aiming to search for an optimal path in a graph; based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of Numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants.

"Good paths through graphs". I have not converted our sample problem from the previous posts to a graphing problem. So no Python sample this time, sorry.

You do get to see some pseudocode though:
Given a number of nodes
Given some edges between nodes (paths)
Given BT := {}, this will contain our best tour
Randomly construct a tour T
Create a number of "ants" (often equal to number of nodes)
Associate a distance, pheromone value, and delta pheromone value to every edge
Iterate until time limit hit or satisfiable solution found:
  For each ant do:
    Do until tour constructed:
    Next node is chosen depending on visibility (e.g. 1/distance) and pheromone trail
    E.g. choose next node with probability (visibility^a)*(pheromone trail^b)
    Calculate fitness of this tour
    Copy this tour to the best tour if fitness is better
    Update the pheromone trail of each edge of this ant's tour:
    E.g. delta pheromone for edge := 1/(tour cost)
  For each edge:
    Lower pheromone value by a factor
    Add delta pheromone value to pheromone value
    Set delta pheromone := 0

ACO is a very interesting method, again, we see certain aspects already used in the previous posts, like using pheromone trails to avoid local maxima. Since the algorithm is closely tied to graphs, nodes and paths, it's no wonder it's often used to find shortest paths, or to solve the travelling salesman problem.

There are some interesting links on the Wikipedia page, one of them is this application:
After a while, an ant finds a path to the food and places pheromones:

Those who are interested can read more about Swarm intelligence or Particle swarm optimization.


-----
Table Of Contents (click a link to jump to that post)

Monday, January 19, 2009

Modern Genetic (And Other) Algorithms Explained: Part 4 - Simulated Annealing

(This is part 4 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

First of all: Simulated Annealing is not a genetic algorithm, but it is a modern optimization technique.

Wikipedia tells us the following:
Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global minimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated annealing may be more effective than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution.
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one.
Let's take a look at some pseudocode:

Randomly construct a valid solution
For each temperature do:
  For number of trials to do at each temperature do:
    Move solution to a neighbour
    Accept neighbour with probability(old_score, new_score, temperature)
    Lower the temperature by a reduction factor

It becomes clear that this method can be used in discrete optimization problems only, so that we can construct neighbours from our current state. E.g., a first solution in the problem we've been discussing might be [10,51,5,18] with a neighbour [10,52,5,18].

Also, a probability function needs to be defined. Some functions are constructed so that better solutions will always be accepted. Ideally, we would always like to assign a certain probability, so that worse solutions have a chance of being accepted too (or better ones have a chance at rejection). Again: this is to avoid local maxima (comparable with GA's).

Often, the new solution is accepted with an exponential distribution: exp( (new_score-old_score)/temperature ).

Note: the pseudocode at the Wikipedia page doesn't use trials, for some problems, this is good enough.

Let's see some Python code (again, based on the code mentioned in the previous posts):

from random import randint, random
from operator import add
from math import *

def individual(length, min, max):
  'Create a member of the population.'
  return [ randint(min,max) for x in xrange(length) ]

def fitness(individual, target):
  """
  Determine the fitness of an individual. Higher is better.
  individual: the individual to evaluate
  target: the target number individuals are aiming for
  """
  sum = reduce(add, individual, 0)
  return abs(target-sum)

def probability(o_fitness, n_fitness, temperature):
  if n_fitness < o_fitness:
    return 1
  return exp( (n_fitness-o_fitness) / temperature)

def temperature(step, max_steps):
  return max_steps - step

def neighbour(ind, min, max):
  pos_to_mutate = randint(0, len(ind)-1)

  if random() < 0.5:
    ind[pos_to_mutate] -= 1
  else:
    ind[pos_to_mutate] += 1

  if ind[pos_to_mutate] < min:
    ind[pos_to_mutate] = min
  elif ind[pos_to_mutate] > max:
    ind[pos_to_mutate] = max

  return ind

def evolve(ind, nr_trials, step, max_steps, min, max, target):
  best_fit = 10000;
  for i in range(1,nr_trials):
    n_ind = neighbour(ind, min, max)
    o_fitness = fitness(ind,target)
    n_fitness = fitness(n_ind,target)

  if n_fitness < best_fit:
    best_fit = n_fitness

  #move to new state?
  if probability(o_fitness, n_fitness, temperature(step,max_steps)) >= random():
    ind = n_ind
    print "Best fitness this evolution:",best_fit
    print "Temperature this evolution:",temperature(step,max_steps)

  return ind

If the fitness of the neighbour is better (remember: that means lower), we immediately accept it. We don't really need a chance of rejection for this problem. Otherwise, we use exp( (n_fitness-o_fitness) / temperature).
We use a separate function to calculate the temperature for each step. This function is kept fairly simple (max steps - this step), but non-linear temperature can also be implemented.

Let's try it, our starting temperature becomes 100, using 1000 trials per temperature:

import sys
sys.path.append("C:\Users\Seppe\Desktop")
from annealing import *
target = 300
i_length = 6
i_min = 0
i_max = 100
i = individual(i_length, i_min, i_max)
print fitness(i, target)
i_k = 0
i_kmax = 100
i_trials = 1000
while i_k < i_kmax:
  i = evolve(i, i_trials, i_k, i_kmax, i_min, i_max, target)
  i_k += 1

The output:
Best fitness this evolution: 34
Temperature this evolution: 100
Best fitness this evolution: 7
Temperature this evolution: 99
Best fitness this evolution: 0
Temperature this evolution: 98
Best fitness this evolution: 44
Temperature this evolution: 97
Best fitness this evolution: 43
Temperature this evolution: 96
Best fitness this evolution: 33
Temperature this evolution: 95
Best fitness this evolution: 39
Temperature this evolution: 94
Best fitness this evolution: 44
Temperature this evolution: 93
Best fitness this evolution: 34
Temperature this evolution: 92
Best fitness this evolution: 36
Temperature this evolution: 91
Best fitness this evolution: 50
Temperature this evolution: 90
Best fitness this evolution: 68
Temperature this evolution: 89
Best fitness this evolution: 67
Temperature this evolution: 88
Best fitness this evolution: 53
Temperature this evolution: 87
Best fitness this evolution: 49
Temperature this evolution: 86
Best fitness this evolution: 35
Temperature this evolution: 85
Best fitness this evolution: 55
Temperature this evolution: 84
Best fitness this evolution: 5
Temperature this evolution: 83
Best fitness this evolution: 0

Simulated annealing is easy to program and implement. Provided it is easy to construct neighbours, and a sensible combination of temperature and probability functions can be constructed, and the number of trials is well defined. Still: this method might be too naive to solve more difficult problems.

The source code can be downloaded here.

An interesting implementation is this. It is based on another genetic programming experiment located here (both are very interesting and fun examples, I highly recommend reading them).

Another Java applet to look at: solving a travelling salesman problem with simulated annealing.




-----
Table Of Contents (click a link to jump to that post)

Tuesday, January 13, 2009

Modern Genetic (And Other) Algorithms Explained: Part 3 - CHC Eshelman

(This is part 3 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

In this part, we will look at a "special" genetic algorithm, CHC by Eshelman. The original paper is very hard to find, but it is mentioned in a lot of other works.

The pseudocode looks like this (thanks to the commenters for sorting some problems out):
Create initial population: P
L := length of an individual chromosome
N := population size
Threshold :=
Threshold := MutProb * (1.0-MutProb) * L (or L/4 is also used)
Evolution until timelimit hit or satisfiable solution found:
  CPop := {}
  For i := 1 to N/2 do:
    Choose two random parents: P1 and P2
    If (different bits between P1 an P2) / 2 > threshold do:
      Create children C1 and C2 using half-uniform crossover
      Add C1 and C2 to CPop

  If there are no children in Cpop:
    Threshold := threshold - 1
  Else:
    P := best N individuals from P and CPop

  If threshold < 0
    Cataclysmic creation of new population P
    
Threshold := MutProb * (1.0-MutProb) * L (or L/4 is also used)


There is another pseudocode description in the slides found here.

A few interesting remarks:
(1) N (population size) is almost always set to 50, but can range from 50 - 10000.
(2) The different bits between P1 and P2 can be defined as the hamming distance between them.
(3) Half uniform crossover swaps exactly half of the non-matching bits. However, often a uniform crossover is used, with a chance of 0.5 - 0.8 of swapping.
(4) MutProb is the mutation probability and originally set to 0.35 (35%).
(5) A "cataclysmic event" occurs when there are no children created for a certain period of time. New children can only be made between parents which are different enough. Basically this means: whenever the population converges towards a certain points, a cataclysm occurs.
(6) What this cataclysm will do depends on the actual implementation. Originally, and often, the following method is used: take the single best individual, and put it in the new population. Now, mutate each of its bits with a 35% chance, this will be the second individual. Repeat this to create a new population of size N. Sometimes, the mutation chance is set to a higher value.

My Python - again, based on the code found here - implementation looks as follows (the original problem was unchanged):
from random import randint, random
from operator import add

def individual(length, min, max):
  return [ randint(min,max) for x in xrange(length) ]

def population(count, length, min, max):
  return [ individual(length, min, max) for x in xrange(count) ]

def fitness(individual, target):
  sum = reduce(add, individual, 0)
  return abs(target-sum)

def grade(pop, target):
  summed = reduce(add, (fitness(x, target) for x in pop))
  return summed / (len(pop) * 1.0)

def hamming(ind1, ind2):
  nr_hamming = 0
  for x in range(0, len(ind1) - 1):
    if (ind1[x] != ind2[x]):
      nr_hamming += 1
  return nr_hamming

def cataclysm(pop, target, min, max):
  #keep the best individual, flip 35% of bits to get new individuals
  pop.sort (lambda x, y : cmp (fitness(x, target), fitness(y, target)))
  firstind = pop[0]
  newpop = [firstind]
  for x in range(1, len(pop)):
    nextind = firstind[:]
    for i in range(0, len(nextind)):
      if 0.35 > random():
        nextind[i] = randint(min, max)
    newpop.append(nextind)
  return newpop

def hux(ind1, ind2):
  child_one = []
  child_two = []
  hamming_dist = hamming(ind1, ind2);
  nr_swaps = 0
  for x in range(0, len(ind1)):
    if (ind1[x] == ind2[x]) or (random > 0.5) or (nr_swaps > hamming_dist / 2):
      #same, just copy to both
      child_one.append(ind1[x])
      child_two.append(ind2[x])
    else:
      #different, swap with .5 probability, until hamming/2 swaps
      nr_swaps += 1
      child_one.append(ind2[x])
      child_two.append(ind1[x])
  return [child_one,child_two]

def evolve(pop, target, min, max, treshold):
  child_population = []
  for i in range(1, len(pop)/2):
    #choose two random parents:
    parent_one = pop[randint(0, len(pop)-1)]
    parent_two = pop[randint(0, len(pop)-1)]
    if (hamming(parent_one, parent_two)/2) > treshold[0]:
      #do hux crossover
      children = hux(parent_one, parent_two)
      child_population.append(children[0])
      child_population.append(children[1])
  if len(child_population) == 0:
    treshold[0]-=1;
    print "No children evolved"
  else:
    p_count = len(pop);
    print len(child_population),"children"
    for x in child_population:
      pop.append(x)
    pop.sort (lambda x, y : cmp (fitness(x, target), fitness(y, target)))
    #for x in pop:
    # if fitness(x,target) == 0:
    # print "Perfect individual found:",x
    pop = pop[:p_count]
    print len(pop),"new population, grade:", grade(pop, target)
  if treshold[0] < 0:
    pop = cataclysm(pop, target, min, max)
    print "Cataclysm, newpop length:",len(pop),"grade:",grade(pop,target)
    treshold[0] = len(pop[0]) / 4.0
    print "Treshold is now:",treshold[0]
  return pop

This reminds me: I should really work on a css class for code (update: done), instead of writing everything in monospace. A few remarks:
(1) The implementation is a bit hacky. Python passes everything by reference, except immutable objects. I wanted to pass treshold by reference, which did not work, it being a float and such. That's why I've wrapped it in a list.
(2) I'll use L/4 as the treshold; and I still use a 35% mutate rate, although we are not using bit encoded individuals, though we could set this a bit higher if we wanted.
(3) We do crossover by randomly swapping different values with a 0.5 chance, until half of the values are swapped. Probability-wise, this is not the same as randomly picking half of the different bits. This doesn't matter that much for this example, though.

Let's test it:
import sys
sys.path.append("C:\Users\Seppe\Desktop")
from chc_eshelman import *
target = 300
p_count = 50
i_length = 6
i_min = 0
i_max = 100
treshold = [i_length / 4.0]
p = population(p_count, i_length, i_min, i_max)
print "First grade: ",grade(p, target)
for i in range(0,100):
  p=evolve(p, target, i_min, i_max, treshold)

In the first run, it took two cataclysms to reach a completely perfect population (grade is the average score for the complete population, not for the best single individual, it might be possible to have a perfect individual in the first evolution, still, because this problem is so simple, we look at the complete population):
First grade: 66.88
48 children
50 new population, grade: 30.34
44 children
50 new population, grade: 18.92
48 children
50 new population, grade: 10.64
38 children
50 new population, grade: 6.68
40 children
50 new population, grade: 4.74
36 children
50 new population, grade: 3.84
12 children
50 new population, grade: 3.48
12 children
50 new population, grade: 3.12
6 children
50 new population, grade: 3.0
No children evolved
No children evolved
Cataclysm, newpop length: 50 grade: 48.24
Treshold is now: 1.5
46 children
50 new population, grade: 17.36
36 children
50 new population, grade: 7.8
32 children
50 new population, grade: 4.1
20 children
50 new population, grade: 2.76
14 children
50 new population, grade: 2.44
16 children
50 new population, grade: 2.12
22 children
50 new population, grade: 1.68
20 children
50 new population, grade: 1.28
18 children
50 new population, grade: 1.0
No children evolved
No children evolved
Cataclysm, newpop length: 50 grade: 48.86
Treshold is now: 1.5
40 children
50 new population, grade: 21.04
46 children
50 new population, grade: 5.3
36 children
50 new population, grade: 1.56
40 children
50 new population, grade: 0.38
32 children
50 new population, grade: 0.0

Another run only takes four evolutions to reach a perfect population, with a beautiful convergence:
First grade: 51.16
46 children
50 new population, grade: 24.26
46 children
50 new population, grade: 12.6
34 children
50 new population, grade: 5.78
38 children
50 new population, grade: 0.94
34 children
50 new population, grade: 0.0
20 children
50 new population, grade: 0.0

Sometimes however, the algorithm gets stuck in a loop:
...
18 children
50 new population, grade: 1.0
22 children
50 new population, grade: 1.0
24 children
50 new population, grade: 1.0
16 children
50 new population, grade: 1.0
26 children
50 new population, grade: 1.0
24 children
50 new population, grade: 1.0
24 children
50 new population, grade: 1.0
18 children
50 new population, grade: 1.0
18 children
50 new population, grade: 1.0
14 children
50 new population, grade: 1.0
16 children
50 new population, grade: 1.0
16 children
50 new population, grade: 1.0

This has something to do with the way the hamming distance is calculated. Sometimes, a pool of two different solutions will be made, but with more than one different values, thus this will always be above the hamming treshold, but will always create the same children, and the same resulting new population.

For example, the algorithm can get stuck in a pool with two types of parents:
1: [83, 19, 67, 64, 23, 44], sum 300
(the target was 300, so fitness: 300-300 = 0: perfect)
2: [38, 28, 67, 64, 6, 97], sum 300


Both are optimal (but the sum might also be both 299, or 299 and 301, etc)... Notice that the hamming distance between them is four, far above the treshold, thus the following children can be created:
[38, 28, 67, 64, 23, 44], sum 264
[38, 19, 67, 64, 6, 97], sum 291
[83, 28, 67, 64, 6, 44], sum 292
[83, 19, 67, 64, 6, 97], sum 336


However, these children all perform worse and will never be considered for the new population, and this is how we get stuck in a loop.

If we'd used a bit-representation, or other workarounds, this would've worked better. For example use another check: if there is no change in the population members: do treshold := treshold - 1. Still, it's good enough to show the workings of the algorithm.

In conclusion, CHC performs very well with only a very limited population size, even in problems where local maxima are common.

If you want to download the source code used in this post, you can find it here.



-----
Table Of Contents (click a link to jump to that post)

Saturday, January 10, 2009

Modern Genetic (And Other) Algorithms Explained: Part 2 - Genetic Algorithms

(This is part 2 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

Let's take another look at the blog post on which we will base ourselves on. A general genetic algorithms can be described as follows:

Construct initial population
Repeat until time limit hit or satisfiable solution found:
  Assign a fitness score to each individual
  Use a selection method to pick individuals for reproduction
  Construct a new population, using a crossover method
  Use a mutation method to mutate the new population

There are still a lot of gaps to fill in: how large should the population be, how do we construct individuals, which fitness function do we use, which selection methods are available, how do we evolute the population, should we mutate, and how?

Population and Individuals: Encoding and Construction
Let's start with the population: if the population count is too low, there will not be enough diversity between the individuals to find an optimal or good solution. If the count is too high, the algorithm will execute slower, conversion to a solution might become slower as well. There is no 'golden rule' to find an optimal number.

How do we construct individuals? Certainly a simple (and popular choice) is the bit-representation:

0110 0111 0001 1011 ...

When evaluation the individuals, those bits are converted in something meaningful, be it numbers, strings, or other symbols, depending on the problem.

If we had used this representation, an individual for the problem described in the article (find x numbers each between a and b so that the summation of those numbers equal z) would look like this (e.g. x = 4):

01100111 01000101 11010101 11010111

Which would represent 4 numbers between zero and 255: 103+69+213+215. Also notice that our individuals have a fixed-length. Variable-length is a bit more complex, but also possible, and sometimes needed.

What do we do if the number lies out of the permitted bounds? We could:
(1) Avoid that situation by making sure we never generate an individual with a number < a or > b.
(2) Permit those individuals, but assign a very low fitness score to them. In the article, the score is calculated as the absolute difference between the sum of the individual's numbers and the target value. So a score zero (0) means perfect fitness: a valid solution. In this case, invalid individuals would get a score of 1000, for example. Or even better, we evaluate it normally, but then add a "penalty". E.g. the absolute difference between the bounds and the numbers times two.
Why would we use (2)? In some cases it is too difficult or too time-consuming to check every individual to see if it is valid, before putting it in the population pool. Also: permitting invalid individuals might still be useful. They can introduce more variety in the population, and their valid parts can still be used to create offspring.

Why would we use this representation? As we will see in the following steps, a crossover between two individuals will happen to create new offspring, with a bit representation, the crossover point can be placed at any point:

             v
01100111 010 | 00101 11010101 11010111
01101111 000 | 00100 10010111 10011100


When we use a list of integers, we place the crossover point between two numbers, this is different from the previous example, where the crossover point could be arbitrarily placed "in the middle of a number":

    v
103 | 69 , 213 , 215
13  | 22 , 123 , 76


Bit-representation is thus often used when it is not clear how to place the crossover point, but when we do know a way to "convert" a solution to a bitstring. However, always watch out. Some representations become too sensitive to "random" tinkering with bits, making them quickly invalid. In our case: using a bit-representation would be permitted (changing a random bit still creates four integers), but in other cases this method becomes infeasible.

Selection
Good, we now have constructed an initial population, containing, say 100, individuals. We also have a fitness function to assign a score to each of them. Lower is better, 0 being a perfect solution.

How do we pick individuals to use to create offspring? We'll need two individuals (a father and a mother). A first way (1) to choose them might be: just pick them at random. Of course, it is ease to see that this is a bad way of choosing parents. There has to be a relation between the fitness and the selection, so that better offspring can be created. When we choose parents at random, bad parents have an equal chance of producing children than fitter parents. (This is against the 'survival of the fittest'-idea, on which genetic algorithms are based on.)


In the article, a second (2) method is used:

Pick the n-best individuals from our current population
Pick two different random individuals from those n-best
Use those as parents for a child

This is certainly a better option, but this way, we completely abandon the lower scoring individuals. It is always better to give even the worst individuals a little chance to be a parent, to maintain a higher level of possible diversity.


With that remark in mind, we arrive at a third (3) solution, the roulette-wheel selection method: the chance of each individual to be selected becomes: fitness of that indivual / total of fitness scores of every individual in the population. For example, consider the following population of five individuals, ordered by their score (let's say that higher is better).

i3: 9
i2: 7
i5: 6
i4: 3
i1: 1
Total: 26


Now pick a random value between zero and 26. Let's say 8: 8 > 1, continue; 8 > 1 + 3, continue; 8 > 1 + 3 + 6, we pick i5. It's easy to see where the name "roulette selection" comes from.


Another selection method (4) is called tournament selection:
Choose k (the tournament size) individuals from the population at random
  Choose the best individual from pool/tournament with probability p
  Choose the second best individual with probability p*(1-p)
  Choose the third best individual with probability p*((1-p)^2)
  ...

Note that p can be = 1. Then the best out of k individuals is chosen, this is fairly common. For k, often 2, 3, or 4 is used. This is an often-used method because it is easy to implement, and can be used in parallel environments. Note that when p = 1, k = 1 this method essentially becomes a pure random selection.

Crossover
Now that we know how to select two parents, how do we create children? Again, there are many techniques here. The first one (1) is the one-point crossover (simple crossover). I will illustrate the following examples with bit-represented individuals.

                    v
0000 0000 0000 0000 | 0000 0000
1111 1111 1111 1111 | 1111 1111

Creates two children:

                    v
0000 0000 0000 0000 | 1111 1111
1111 1111 1111 1111 | 0000 0000


The crossover point can be randomly chosen, or can be a fixed location (1/4, 1/3, 1/2 are commonly used). After the crossover point, two children are created by swapping their bits.

The second (2) crossover method is two-point crossover, and looks a lot like the previous method:

     v                v
0000 | 0000 0000 0000 | 0000 0000
1111 | 1111 1111 1111 | 1111 1111

Creates two children:

     v                v
0000 | 1111 1111 1111 | 0000 0000
1111 | 0000 0000 0000 | 1111 1111


Again: it's swap - and swap.

Cut and splice (3) is another one, and is only interesting when you need variable-length individuals. I will skip the description, it's in the Wikipedia page (all sources are mentioned at the end of this post).

UX (Uniform Crossover) (4) is a bit more interesting:

1111 1111 1111 0000 0000 0000
1111 1111 1111 1111 1111 1111

Creates two children:

1111 1111 1111 0010 1101 0100
1111 1111 1111 1101 0010 1011


It works as such: every bit has a 0.5% chance of swapping. Of course, if both parents have a bit 0 or 1, it stays 0 or 1.

HUX (Half Uniform Crossover) (5) swaps exactly half of the non-matching bits. So pick N/2 bits out of N non-matching bits, and swap them.

In our reference article, a single fixed crossover point is used, placed in the middle.

Mutation
Now that that's out of the way, there is only one aspect to look at: mutation. This phase makes sure that there is enough randomness and diversity in the population. Again, there are different options.

First (1) of all: no mutation. For example when their is enough diversity by using smart selection and crossover methods, or when the optimization problem is as such that there are no local maxima (more about that a bit later).

A second (2) method: take N individuals out of our population, and change each of their bits/values with a P chance. Often: N is equal to the complete population size, with P a very small value. Or:

(3) Take N individuals, pick K bits/values in every individual, and change those bits/values. Again: N is often equal to the complete population, and K also low (one for example). This is the method used in the article.

Sometimes, also the following additional (4) method is used: create offspring equal to (P-N), with P being to population size and 0<=N<=P, then add N random individuals. When this method is used, often values like N=P/5 to P/10 are used.

In Action
That's it, we're done! Let's see a few action examples of genetic algorithms.

A problem involves: given a constrained plane and a few randomly placed circles, how can we place another circle so that the radius is maximal, without overlapping the other circles. You can download and try it for yourself here (all sites are also mentioned at the end of this post).

Before the evolution starts, the situation looks like this:

After only 61 generations, we are coming close to the optimum:


Another really cool example can be found here. Be sure to download the binary and test it for yourself:


This site implements a Travelling Salesman Solver with a GA Java applet:


Finally, using the code from the article:
import sys
  sys.path.append("C:\Users\Seppe\Desktop")
  from genetic import *
  target = 300
  p_count = 100
  i_length = 5
  i_min = 0
  i_max = 100
  p = population(p_count, i_length, i_min, i_max)
  fitness_history = [grade(p, target),]
  for i in xrange(100):
    p = evolve(p, target)
    fitness_history.append(grade(p, target))

  for datum in fitness_history:
    print datum

Outputs:
66.51
28.84
19.41
18.66
11.97
13.26
5.41
1.15
1.5
1.55
2.9
3.0
0.3
0.0
0.0
0.0
...

After 14 evolutions we already see a perfect population, not bad... Do note however that this is an extremely easy problem: there are many optimum solutions.

Remarks and Problems
Genetic Algorithms are not perfect, nor are they a silver bullet. When badly configured, genetic algorithms tend to expose the same flaws as stochastic hill climbing indeed: a tendency to converge towards local optima.

Consider the following function:

It's not hard to construct a GA which will tend toward the global maximum, even without mutation or introducing much diversity. Consider the following animation, with the green dots representing a few individuals:

However, when you consider the following function:

We might get lucky and end up with:

Notice that there is one individual tending towards the local maximum, but the others "pull" it towards the global one. However, the following could also happen:

To prevent these situations from happening, we have various options at our disposal:
(1) Choose sensible parameters. Population count, selection- crossover- and mutation-methods. Try to find a balance between enough diversity and fast convergence.
(2) Run the GA multiple times, take the best solution.
(3) If the population converges towards a certain solution, randomly regenerate the N worst members of the population and continue evolution (a similar technique could be used in the mutation stage).
(4) Use parallel solutions. A parallel genetic algorithm executes multiple populations at once, often on many computers (or a cluster). Often, there is an extra migration stage added, in which individuals migrate from one population to another.

Sources
Wow, what a lot of text! If you're interested in knowing more, check the following links:
  • The original article which inspired this text.
  • Wikipedia also has lots of information: Genetic algorithms, Selection (Roulette and Tournament), Crossover and Mutation.
  • An old tutorial on ai-junkie (but still useful). Back in the day, this was the first thing I read about genetic algorithms.
  • Another old website, made by Marek Obitko, still, the information contained is relevant and interesting. Also hosts the TSP Java applet.
  • This opendir contains university slides about emerging computing technologies. A032 talks about genetic algorithms. The slides are ugly as hell, but contain some good information! (There is a part about CHC Eshelman, which we will discuss in the next part in this series).
  • This Powerpoint file contains some information about CHC Eshelman as well. The file is not that interesting, but does contain a good example on crossover feasibility: sometimes our individuals are encoded as such that they can become invalid after each crossover. We must then "normalize" or correct parents or children to produce valid offspring. We mentioned this problem in the above post.
  • Evolution of Mona Lisa, check this!
There are a lot of software implementations around for genetic algorithms, most of them are written in C, C++ or FORTRAN77, but recently languages such as Java and Python are becoming more popular.
Check Wikipedia's External links for more tutorials and software libraries.

In the next section: we will explore a particular GA implementation: CHC by Eshelman.




-----
Table Of Contents (click a link to jump to that post)