Minimax explained for an idiot

orokusaki picture orokusaki · Oct 18, 2010 · Viewed 26.8k times · Source

I've wasted my entire day trying to use the minimax algorithm to make an unbeatable tictactoe AI. I missed something along the way (brain fried).

I'm not looking for code here, just a better explanation of where I went wrong.

Here is my current code (the minimax method always returns 0 for some reason):

from copy import deepcopy


class Square(object):
    def __init__(self, player=None):
        self.player = player
    
    @property
    def empty(self):
        return self.player is None


class Board(object):
    winning_combos = (
        [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 3, 6], [1, 4, 7], [2, 5, 8],
        [0, 4, 8], [2, 4, 6],
    )
    
    def __init__(self, squares={}):
        self.squares = squares
        for i in range(9):
            if self.squares.get(i) is None:
                self.squares[i] = Square()
    
    @property
    def available_moves(self):
        return [k for k, v in self.squares.iteritems() if v.empty]
    
    @property
    def complete(self):
        for combo in self.winning_combos:
            combo_available = True
            for pos in combo:
                if not pos in self.available_moves:
                    combo_available = False
            if combo_available:
                return self.winner is not None
        return True
    
    @property
    def player_won(self):
        return self.winner == 'X'
    
    @property
    def computer_won(self):
        return self.winner == 'O'
    
    @property
    def tied(self):
        return self.complete == True and self.winner is None
    
    @property
    def winner(self):
        for player in ('X', 'O'):
            positions = self.get_squares(player)
            for combo in self.winning_combos:
                win = True
                for pos in combo:
                    if pos not in positions:
                        win = False
                if win:
                    return player
        return None
    
    @property
    def heuristic(self):
        if self.player_won:
            return -1
        elif self.tied:
            return 0
        elif self.computer_won:
            return 1
    
    def get_squares(self, player):
        return [k for k,v in self.squares.iteritems() if v.player == player]
    
    def make_move(self, position, player):
        self.squares[position] = Square(player)
    
    def minimax(self, node, player):
        if node.complete:
            return node.heuristic
        a = -1e10000
        for move in node.available_moves:
            child = deepcopy(node)
            child.make_move(move, player)
            a = max([a, -self.minimax(child, get_enemy(player))])
        return a


def get_enemy(player):
    if player == 'X':
        return 'O'
    return 'X'

Answer

Guy Sirton picture Guy Sirton · Oct 18, 2010

Step 1: Build your game tree

Starting from the current board generate all possible moves your opponent can make. Then for each of those generate all the possible moves you can make. For Tic-Tac-Toe simply continue until no one can play. In other games you'll generally stop after a given time or depth.

This looks like a tree, draw it yourself on a piece of paper, current board at top, all opponent moves one layer below, all your possible moves in response one layer below etc.

Step 2: Score all boards at the bottom of the tree

For a simple game like Tic-Tac-Toe make the score 0 if you lose, 50 tie, 100 win.

Step 3: Propagate the score up the tree

This is where the min-max come into play. The score of a previously unscored board depends on its children and who gets to play. Assume both you and your opponent always choose the best possible move at the given state. The best move for the opponent is the move that gives you the worst score. Likewise, your best move is the move that gives you the highest score. In case of the opponent's turn, you choose the child with the minimum score (that maximizes his benefit). If it is your turn you assume you'll make the best possible move, so you choose the maximum.

Step 4: Pick your best move

Now play the move that results in the best propagated score among all your possible plays from the current position.

Try it on a piece of paper, if starting from a blank board is too much for you start from some advanced Tic-Tac-Toe position.

Using recursion: Very often this can be simplified by using recursion. The "scoring" function is called recursively at each depth and depending on whether or not the depth is odd or even it select max or min respectively for all possible moves. When no moves are possible it evaluates the static score of the board. Recursive solutions (e.g. the example code) can be a bit trickier to grasp.