Monte Carlo Tree Searching UCT implementation

Makers_F picture Makers_F · Jan 29, 2012 · Viewed 22.5k times · Source

Can you explain me how to build the tree?

I quite understood how the nodes are chosen, but a nicer explanation would really help me implementing this algorithm. I already have a board representing the game state, but I don't know (understand) how to generate the tree.

Can someone points me to a well commented implementation of the algorithm (I need to use it for AI)? Or better explanation/examples of it?

I didn't found a lot of resources on the net, this algorithm is rather new...

Answer

danielbeard picture danielbeard · Jan 30, 2012

The best way to generate the tree is a series of random playouts. The trick is being able to balance between exploration and exploitation (this is where the UCT comes in). There are some good code samples and plenty of research paper references here : https://web.archive.org/web/20160308043415/http://mcts.ai:80/index.html

When I implemented the algorithm, I used random playouts until I hit an end point or termination state. I had a static evaluation function that would calculate the payoff at this point, then the score from this point is propagated back up the tree. Each player or "team" assumes that the other team will play the best move for themselves, and the worst move possible for their opponent.

I would also recommend checking out the papers by Chaslot and his phd thesis as well as some of the research that references his work (basically all the MCTS work since then).


For example: Player 1's first move could simulate 10 moves into the future alternating between player 1 moves and player 2 moves. Each time you must assume that the opposing player will try to minimize your score whilst maximizing their own score. There is an entire field based on this known as Game Theory. Once you simulate to the end of 10 games, you iterate from the start point again (because there is no point only simulating one set of decisions). Each of these branches of the tree must be scored where the score is propagated up the tree and the score represents the best possible payoff for the player doing the simulating assuming that the other player is also choosing the best moves for themselves.

MCTS consists of four strategic steps, repeated as long as there is time left. The steps are as follows.

  1. In the selection step the tree is traversed from the root node until we reach a node E, where we select a position that is not added to the tree yet.

  2. Next, during the play-out step moves are played in self-play until the end of the game is reached. The result R of this “simulated” game is +1 in case of a win for Black (the first player in LOA), 0 in case of a draw, and −1 in case of a win for White.

  3. Subsequently, in the expansion step children of E are added to the tree.

  4. Finally, R is propagated back along the path from E to the root node in the backpropagation step. When time is up, the move played by the program is the child of the root with the highest value. (This example is taken from this paper - PDF

www.ru.is/faculty/yngvi/pdf/WinandsBS08.pdf

Here are some implementations:

A list of libraries and games using some MCTS implementations http://senseis.xmp.net/?MonteCarloTreeSearch

and a game independent open source UCT MCTS library called Fuego http://fuego.sourceforge.net/fuego-doc-1.1/smartgame-doc/group__sguctgroup.html